[GitHub] kafka-site pull request #45: Manual edits needed for 0.10.2 release

2017-02-03 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka-site/pull/45

Manual edits needed for 0.10.2 release



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka-site KRelease-0.10.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka-site/pull/45.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #45


commit 10ac0f402ca67629a274bd158ec25cf3897ca3f7
Author: Guozhang Wang 
Date:   2017-02-04T02:53:13Z

add quickstart for template

commit 153ce91fd9ad36bd76290be547708de1f2d9ab92
Author: Guozhang Wang 
Date:   2017-02-04T03:15:06Z

manual steps needed for 0.10.2 release




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression

2017-02-03 Thread Jeff Widman
*My concern is that this minor improvement (based on the benchmark) over
LZ4 does not warrant the work of adding support for a new compression codec
to the broker, official clients and horde of 3rd party clients, including
upgrade paths, transformations, tests, additional dependencies, etc.*


As someone who primarily works with non-Java clients, I couldn't +1 this
enough.

If there's truly a benefit (and probably there is...the Facebook post
announcing ZStandard had some fairly compelling benchmarks) then yes, let's
do this. But otherwise, please no.

On Tue, Jan 31, 2017 at 1:47 AM, Magnus Edenhill  wrote:

> Hi Dongjin and good work on the KIP,
>
> I understand that ZStandard is generally considered an improvement over
> LZ4, but the
> benchmark you provided on the KIP-110 wiki doesn't really reflect that, and
> even
> makes a note that they are comparable:
> *> As you can see above, ZStandard shows outstanding performance in both of
> compression rate and speed, especially working with the speed-first setting
> (level 1). To the extent that only LZ4 can be compared to ZStandard.*
>
> My concern is that this minor improvement (based on the benchmark) over LZ4
> does not warrant the work
> of adding support for a new compression codec to the broker, official
> clients and horde of 3rd party clients, including
> upgrade paths, transformations, tests, additional dependencies, etc.
>
> Is it possible to produce more convincing comparisons?
>
> Thanks,
> Magnus
>
>
>
>
>
> 2017-01-31 10:28 GMT+01:00 Dongjin Lee :
>
> > Ismael & All,
> >
> > After Inspecting the related code & commits, I concluded following:
> >
> > 1. We have to update the masking value which is used to retrieve the used
> > codec id from the messages, to enable the retrieval of the 3rd bit of
> > compression type field of the message.
> > 2. The above task is already done; so, we need nothing.
> >
> > Here is why.
> >
> > Let's start with the first one, with the scenario Juma proposed. In the
> > case of receiving the message of unsupported compression type, the
> receiver
> > (= broker or consumer) raises IllegalArgumentException[^1][^2]. The key
> > element in this operation is Record#COMPRESSION_CODEC_MASK, which is used
> > to extract the codec id. We have to update this value from 2-bits
> extractor
> > (=0x03) to 3-bits extractor (=0x07).
> >
> > But in fact, this task is already done, so its current value is 0x07. We
> > don't have to update it.
> >
> > The reason why this task is already done has some story; From the first
> > time Record.java file was added to the project[^3], the
> > COMPRESSION_CODEC_MASK was already 2-bits extractor, that is, 0x03. At
> that
> > time, Kafka supported only two compression types - GZipCompression and
> > SnappyCompression.[^4] After that, KAFKA-1456 introduced two additional
> > codecs of LZ4 and LZ4C[^5]. This update modified COMPRESSION_CODEC_MASK
> > into 3 bits extractor, 0x07, in the aim of supporting four compression
> > codecs.
> >
> > Although its following work, KAFKA-1493, removed the support of LZ4C
> > codec[^6], this mask was not reverted into 2-bits extractor - by this
> > reason, we don't need to care about the message format.
> >
> > Attached screenshot is my test on Juma's scenario. I created topic & sent
> > some messages using the snapshot version with ZStandard compression and
> > received the message with the latest version. As you can see, it works
> > perfectly as expected.
> >
> > If you have more opinion to this issue, don't hesitate to send me as a
> > message.
> >
> > Best,
> > Dongjin
> >
> > [^1]: see: Record#compressionType.
> > [^2]: There is similar routine in Message.scala. But after KAFKA-4390,
> > that routine is not being used anymore - more precisely, Message class is
> > now used in ConsoleConsumer only. I think this class should be replaced
> but
> > since it is a separated topic, I will send another message for this
> issue.
> > [^3]: commit 642da2f (2011.8.2).
> > [^4]: commit c51b940.
> > [^5]: commit 547cced.
> > [^6]: commit 37356bf.
> >
> > On Thu, Jan 26, 2017 at 12:35 AM, Ismael Juma  wrote:
> >
> >> So far the discussion was around the performance characteristics of the
> >> new
> >> compression algorithm. Another area that is important and is not covered
> >> in
> >> the KIP is the compatibility implications. For example, what happens if
> a
> >> consumer that doesn't support zstd tries to consume a topic compressed
> >> with
> >> it? Or if a broker that doesn't support receives data compressed with
> it?
> >> If we go through that exercise, then more changes may be required (like
> >> bumping the version of produce/fetch protocols).
> >>
> >> Ismael
> >>
> >> On Wed, Jan 25, 2017 at 3:22 PM, Ben Stopford  wrote:
> >>
> >> > Is there more discussion to be had on this KIP, or should it be taken
> >> to a
> >> > vote?
> >> >
> >> > On Mon, Jan 16, 2017 at 6:37 AM Dongjin 

[jira] [Commented] (KAFKA-4654) Improve test coverage MemoryLRUCache

2017-02-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852512#comment-15852512
 ] 

ASF GitHub Bot commented on KAFKA-4654:
---

GitHub user bbejeck opened a pull request:

https://github.com/apache/kafka/pull/2500

KAFKA-4654 improve test coverage for MemoryLRUCacheStore



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bbejeck/kafka 
KAFKA-4654_improve_MemroryLRUCache_test_coverage

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2500.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2500


commit abb7325291484cd1799cdc5826fa2f9b313a2867
Author: bbejeck 
Date:   2017-02-04T02:43:05Z

KAFKA-4654 improve test coverage for MemoryLRUCacheStore




> Improve test coverage MemoryLRUCache
> 
>
> Key: KAFKA-4654
> URL: https://issues.apache.org/jira/browse/KAFKA-4654
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
>Assignee: Bill Bejeck
>Priority: Minor
> Fix For: 0.10.3.0
>
>
> {{putAll} - not covered



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4461) When using ProcessorTopologyTestDriver, the combination of map and .groupByKey does not produce any result

2017-02-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852496#comment-15852496
 ] 

ASF GitHub Bot commented on KAFKA-4461:
---

GitHub user amccague opened a pull request:

https://github.com/apache/kafka/pull/2499

KAFKA-4461 Added support to ProcessorTopologyTestDriver for internal topics.

This resolves an issue in driving tests using the 
ProcessorTopologyTestDriver when `groupBy()` is invoked downstream of a 
processor that flags repartitioning.

Ticket: https://issues.apache.org/jira/browse/KAFKA-4461
Discussion: http://search-hadoop.com/m/Kafka/uyzND1wbKeY1Q8nH1

@dguy @guozhangwang 

The contribution is my original work and I license the work to the project 
under the project's open source license.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amccague/kafka 
KAFKA-4461_ProcessorTopologyTestDriver_map_groupbykey

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2499.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2499


commit 938b8bb8b8981ea25bacc7389565e15604c4dcba
Author: Adrian McCague 
Date:   2017-02-04T02:09:35Z

KAFKA-4461 Added support to ProcessorTopologyTestDriver for internal topics.
Ticket: https://issues.apache.org/jira/browse/KAFKA-4461
Discussion: http://search-hadoop.com/m/Kafka/uyzND1wbKeY1Q8nH1




> When using ProcessorTopologyTestDriver, the combination of map and 
> .groupByKey does not produce any result
> --
>
> Key: KAFKA-4461
> URL: https://issues.apache.org/jira/browse/KAFKA-4461
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Hamidreza Afzali
>  Labels: newbie, unit-test
>
> *Problem*
> When using ProcessorTopologyTestDriver in the latest Kafka 0.10.1, the 
> combination of map and .groupByKey does not produce any result. However, it 
> works fine when using KStreamTestDriver.
> The topology looks like this:
> {code}
> builder.stream(Serdes.String, Serdes.Integer, inputTopic)
>  .map((k, v) => new KeyValue(fn(k), v))
>  .groupByKey(Serdes.String, Serdes.Integer)
>  .count(stateStore)
> {code}
> *Full examples*
> Examples for ProcessorTopologyTestDriver and KStreamTestDriver:
> https://gist.github.com/hrafzali/c2f50e7b957030dab13693eec1e49c13
> *Additional info*
> kafka-users mailing list:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201611.mbox/%3CCAHwHRrVq1APVkNhP3HVqxujxRJEP9FwHV2NRcvPPsHX7Wujzng%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2499: KAFKA-4461 Added support to ProcessorTopologyTestD...

2017-02-03 Thread amccague
GitHub user amccague opened a pull request:

https://github.com/apache/kafka/pull/2499

KAFKA-4461 Added support to ProcessorTopologyTestDriver for internal topics.

This resolves an issue in driving tests using the 
ProcessorTopologyTestDriver when `groupBy()` is invoked downstream of a 
processor that flags repartitioning.

Ticket: https://issues.apache.org/jira/browse/KAFKA-4461
Discussion: http://search-hadoop.com/m/Kafka/uyzND1wbKeY1Q8nH1

@dguy @guozhangwang 

The contribution is my original work and I license the work to the project 
under the project's open source license.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amccague/kafka 
KAFKA-4461_ProcessorTopologyTestDriver_map_groupbykey

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2499.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2499


commit 938b8bb8b8981ea25bacc7389565e15604c4dcba
Author: Adrian McCague 
Date:   2017-02-04T02:09:35Z

KAFKA-4461 Added support to ProcessorTopologyTestDriver for internal topics.
Ticket: https://issues.apache.org/jira/browse/KAFKA-4461
Discussion: http://search-hadoop.com/m/Kafka/uyzND1wbKeY1Q8nH1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-111 Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-03 Thread radai
+1

On Fri, Feb 3, 2017 at 11:25 AM, Mayuresh Gharat  wrote:

> Hi All,
>
> It seems that there is no further concern with the KIP-111. At this point
> we would like to start the voting process. The KIP can be found at
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67638388
>
> Thanks,
>
> Mayuresh
>


Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-02-03 Thread Dong Lin
Thanks for the reply, Colin. I have some comments inline.

In addition, I also have some comments regarding the Future() in response
to your latest email. As Ismael mentioned, we have added purgeDataBefore()
API in AdminClient. This API returns Future() that allows user to purge
data in either syn or async manner. And we have presented use-case for both
syn and async usage of this API in the discussion thread of KIP-107. I
think we should at least return a Future() object in this case, right?

As you mentioned, we can transform a blocking API into a Futures-based API
by using a thread pool. But thread pool seems inconvenient as compared to
using future().get() which transform a Future-based API into a blocking
API. Would it be a good reason to return Future() for all those API where
we need both syn and async mode?


On Fri, Feb 3, 2017 at 10:20 AM, Colin McCabe  wrote:

> On Thu, Feb 2, 2017, at 17:54, Dong Lin wrote:
> > Hey Colin,
> >
> > Thanks for the KIP. I have a few comments below:
> >
> > - I share similar view with Ismael that a Future-based API is better.
> > PurgeDataFrom() is an example API that user may want to do it
> > asynchronously even though there is only one request in flight at a time.
> > In the future we may also have some admin operation that allows user to
> > move replica from one broker to another, which also needs to work in both
> > sync and async style. It seems more generic to return Future for any
> > API
> > that requires both mode.
> >
> > - I am not sure if it is the cleanest way to have enum classes
> > CreateTopicsFlags and DeleteTopicsFlags. Are we going to create such
> > class
> > for every future API that requires configuration? It may be more generic
> > to
> > provide Map to any admin API that operates on multiple
> > entries.
> > For example, deleteTopic(Map). And it can be Map > Properties> for those API that requires multiple configs per entry. And
> > we
> > can provide default value, doc, config name for those API as we do
> > with AbstractConfig.
>
> Thanks for the comments, Dong L.
>
> EnumSet, EnumSet, and so forth are
> type-safe ways for the user to pass a set of boolean flags to the
> function.  It is basically a replacement for having an api like
> createTopics(, boolean nonblocking, boolean validateOnly).  It is
> preferrable to having the boolean flags because it's immediately clear
> when reading the code what createTopics(...,
> CreateTopicsFlags.VALIDATE_ONLY) means, whereas it is not immediately
> clear what createTopics(..., false, true) means.  It also prevents
> having lots of function overloads over time, which becomes confusing for
> users.  The EnumSet is not intended as a replacement for all possible
> future arguments we might add, but just an easy way to add more boolean
> arguments later without adding messy function overloads or type
> signatures with many booleans.
>
> Map is not type-safe, and I don't think we should use it in
> our APIs.  Instead, we should just add function overloads if necessary.
>

I agree that using EnumSet is type safe.


>
> >
> > - I not sure if "Try" is very intuitive to Java developer. Is there any
> > counterpart of scala's "Try" in java
>
> Unfortunately, there is no equivalent to Try in the standard Java
> library.  That is the first place I checked, and I spent a while
> searching.  The closest is probably Java 8's Optional.  However,
> Optional just allows us to express Some(thing) or None, so callers would
> not be able to determine what the error was.


> > We actually have quite a few
> > existing
> > classes in Kafka that address the same problem, such as
> > ProduceRequestResult, LogAppendResult etc. Maybe we can follow the same
> > conversion and use *Result as this class name.
>
> Hmm.  ProduceRequestResult and LogAppendResult just store an exception
> alongside the data, and make the caller responsible for checking whether
> the exception is null (or None) before looking at the data.  I don't
> think this is a good solution, because if the user forgets to do this,
> they will interpret whatever is in the data fields as valid, even when
> it is not.  ProduceRequestResult and LogAppendResult are also internal
> classes, not user-facing APIs, so we did not spend time thinking about
> how to make them easy to use for end-users.
>

I am not actually suggesting to use LogAppendResult directly. I am
wondering if it would be more intuitive for developer to name this class
something like OperationResult instead of Try. The OperationResult can
store whatever we want to store with current "Try" class you proposed in
the KIP. It is just my personal opinion that "Try" doesn't directly tell me
what the class actually does. But I am not too worried about it if you
think "Try" is better.


>
> >
> > - How are we going to allow user to specify timeout for blocking APIs
> > such
> > as deleteTopic? Is this configured per AdminClient, or 

[GitHub] kafka pull request #2498: MINOR: Make Kafka Connect step in quickstart more ...

2017-02-03 Thread vahidhashemian
GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/2498

MINOR: Make Kafka Connect step in quickstart more Windows friendly



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka 
doc/connect_quickstart_update

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2498.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2498


commit 4f645fab1a906d7bf56d954c4b87d24e9fb4
Author: Vahid Hashemian 
Date:   2017-02-03T23:49:26Z

MINOR: Make Kafka Connect step in quickstart more Windows friendly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2497: [MINOR] fixed some JavaDoc typos

2017-02-03 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/2497

[MINOR] fixed some JavaDoc typos



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka hofixJavaDocs0102

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2497.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2497


commit ab1c8a325b1d283630634133f989d70f60618fbb
Author: Matthias J. Sax 
Date:   2017-02-03T23:43:46Z

[MINOR] fixed some JavaDoc typos




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] 0.10.2.0 RC0

2017-02-03 Thread Vahid S Hashemian
Hi Ewen,

I tested the quickstart with this release candidate on Ubuntu, Windows (32 
and 64 bit) and Mac.
Everything looks good, except for one issue on Windows 7 32 bit when 
running the WordCount demo application from the Kafka Streams step.

Running that command gave me this error, and as a result, no data was 
written to the output topic:

[2017-02-03 14:11:23,444] WARN Unexpected state transition from 
ASSIGNING_PARTITIONS to NOT_RUNNING 
(org.apache.kafka.streams.processor.internals.StreamThread)
Exception in thread "StreamThread-1" java.lang.ExceptionInInitializerError
at org.rocksdb.RocksDB.loadLibrary(RocksDB.java:64)
at org.rocksdb.RocksDB.(RocksDB.java:35)
at org.rocksdb.Options.(Options.java:22)
at 
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:115)
at 
org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:148)
at 
org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:39)
at 
org.apache.kafka.streams.state.internals.MeteredKeyValueStore$7.run(MeteredKeyValueStore.java:100)
at 
org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
at 
org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:131)
at 
org.apache.kafka.streams.state.internals.CachingKeyValueStore.init(CachingKeyValueStore.java:62)
at 
org.apache.kafka.streams.processor.internals.AbstractTask.initializeStateStores(AbstractTask.java:86)
at 
org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:141)
at 
org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:834)
at 
org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1207)
at 
org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1180)
at 
org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:937)
at 
org.apache.kafka.streams.processor.internals.StreamThread.access$500(StreamThread.java:69)
at 
org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:236)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:255)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:339)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
Caused by: java.lang.UnsupportedOperationException
at 
org.rocksdb.util.Environment.getJniLibraryName(Environment.java:48)
at 
org.rocksdb.NativeLibraryLoader.(NativeLibraryLoader.java:19)
... 26 more
[2017-02-03 14:11:26,319] WARN Unexpected state transition from 
NOT_RUNNING to PENDING_SHUTDOWN 
(org.apache.kafka.streams.processor.internals.StreamThread)


This step worked fine on the other environments. I'm not sure if it is due 
to a bug, or it's isolated to my test environment.

--Vahid




From:   Ewen Cheslack-Postava 
To: dev@kafka.apache.org, "us...@kafka.apache.org" 
, "kafka-clie...@googlegroups.com" 

Date:   02/01/2017 03:45 PM
Subject:[VOTE] 0.10.2.0 RC0



Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 0.10.2.0.

This is a minor version release of Apache Kafka. It includes 19 new KIPs.
See the release notes and release plan (
https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+0.10.2.0)
for more details. A few feature highlights: SASL-SCRAM support, improved
client compatibility to allow use of clients newer than the broker, 
session
windows and global tables in the Kafka Streams API, single message
transforms in the Kafka Connect framework.

Release notes for the 0.10.2.0 release:
http://home.apache.org/~ewencp/kafka-0.10.2.0-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Monday February 6th 5pm PST ***
(Note the longer window to vote to account for the normal 7 days ending in
the middle of the weekend.)

Kafka's KEYS file containing PGP keys we use to sign the release:

[jira] [Assigned] (KAFKA-4733) Improve Streams Reset Tool console output

2017-02-03 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira reassigned KAFKA-4733:
---

Assignee: Gwen Shapira

> Improve Streams Reset Tool console output
> -
>
> Key: KAFKA-4733
> URL: https://issues.apache.org/jira/browse/KAFKA-4733
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Reporter: Matthias J. Sax
>Assignee: Gwen Shapira
>Priority: Minor
>  Labels: beginner, easyfix, newbie
>
> Currently, the console output of {{bin/kafka-streams-application-reset.sh}} 
> is not helpful enough to users:
> - we should add a hint to clean up local state using 
> {{KafkaStreams#cleanup()}}
> - we should clarify what to specify for each parameter (i,e, what is an input 
> topic, what is an intermediate topics)
> - we should clarify, that it is not required to specify internal topics (and 
> what those are)
> - we should clarify what the tool does for the different topics, ie., 
> seek+commit, delete etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka-site pull request #43: [DO NOT MERGE] manual changes after copy-paste ...

2017-02-03 Thread guozhangwang
Github user guozhangwang closed the pull request at:

https://github.com/apache/kafka-site/pull/43


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-03 Thread Matthias J. Sax
Hi All,

I did prepare a KIP to do some cleanup some of Kafka's Streaming API.

Please have a look here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API

Looking forward to your feedback!


-Matthias



signature.asc
Description: OpenPGP digital signature


[jira] [Commented] (KAFKA-4734) timeindex on old segments not trimmed to actual size

2017-02-03 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852317#comment-15852317
 ] 

Jun Rao commented on KAFKA-4734:


[~becket_qin], do you think you could take a look at this? Thanks.

> timeindex on old segments not trimmed to actual size 
> -
>
> Key: KAFKA-4734
> URL: https://issues.apache.org/jira/browse/KAFKA-4734
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.10.1.0
>Reporter: Jun Rao
>
> When upgrading from 0.9.0 to 0.10.1, the broker creates empty .timeindex 
> files on old log segments without trimming them. So, on disk, you will see 
> .timeindex files with preallocated size.
> -rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
> .timeindex
> -rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
> 0960.timeindex
> -rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
> 1920.timeindex
> -rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
> 2880.timeindex
> -rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
> 3840.timeindex
> -rw-r--r--  1 junrao  wheel  10485760 Feb  3 15:15 4800.index
> -rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
> 4800.timeindex
> If the broker is restarted again, all .timeindex files (except for the one on 
> the active segment) are trimmed to 0 size. It would be better if we do that 
> in the first place.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4734) timeindex on old segments not trimmed to actual size

2017-02-03 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-4734:
--

 Summary: timeindex on old segments not trimmed to actual size 
 Key: KAFKA-4734
 URL: https://issues.apache.org/jira/browse/KAFKA-4734
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.10.1.0
Reporter: Jun Rao


When upgrading from 0.9.0 to 0.10.1, the broker creates empty .timeindex files 
on old log segments without trimming them. So, on disk, you will see .timeindex 
files with preallocated size.

-rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
.timeindex
-rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
0960.timeindex
-rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
1920.timeindex
-rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
2880.timeindex
-rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
3840.timeindex
-rw-r--r--  1 junrao  wheel  10485760 Feb  3 15:15 4800.index
-rw-r--r--  1 junrao  wheel  10485756 Feb  3 15:15 
4800.timeindex

If the broker is restarted again, all .timeindex files (except for the one on 
the active segment) are trimmed to 0 size. It would be better if we do that in 
the first place.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4720) Add KStream.peek(ForeachAction<K,V>)

2017-02-03 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-4720:
---
Labels: needs-kip  (was: )

> Add KStream.peek(ForeachAction)
> 
>
> Key: KAFKA-4720
> URL: https://issues.apache.org/jira/browse/KAFKA-4720
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 0.10.1.1
>Reporter: Steven Schlansker
>  Labels: needs-kip
>
> Java's Stream provides a handy peek method that observes elements in the 
> stream without transforming or filtering them.  While you can emulate this 
> functionality with either a filter or map, peek provides potentially useful 
> semantic information (doesn't modify the stream) and is much more concise.
> Example usage: using Dropwizard Metrics to provide event counters
> {code}
> KStream s = ...;
> s.map(this::mungeData)
>  .peek((i, s) -> metrics.noteMungedEvent(i, s))
>  .filter(this::hadProcessingError)
>  .print();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4733) Improve Streams Reset Tool console output

2017-02-03 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-4733:
--

 Summary: Improve Streams Reset Tool console output
 Key: KAFKA-4733
 URL: https://issues.apache.org/jira/browse/KAFKA-4733
 Project: Kafka
  Issue Type: Improvement
  Components: streams, tools
Reporter: Matthias J. Sax
Priority: Minor


Currently, the console output of {{bin/kafka-streams-application-reset.sh}} is 
not helpful enough to users:

- we should add a hint to clean up local state using {{KafkaStreams#cleanup()}}
- we should clarify what to specify for each parameter (i,e, what is an input 
topic, what is an intermediate topics)
- we should clarify, that it is not required to specify internal topics (and 
what those are)
- we should clarify what the tool does for the different topics, ie., 
seek+commit, delete etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-03 Thread Jason Gustafson
Hi Tom,

I said this in the voting thread, but can the authors include a section
> about new ACLs if there are going to be ACLs for TransactionalId. It's
> mentioned in the google doc, but I think new ACLs should be in a KIP
> directly.


We've updated the wiki. Can you take a look and let us know if you have
additional concerns?

Thanks,
Jason

On Fri, Feb 3, 2017 at 1:52 PM, Rajini Sivaram 
wrote:

> Hi Jason,
>
> Thank you for the responses. Agree that authorizing transactional.id in
> the
> producer requests will be good enough for version 1. And making it tighter
> in future based on delegation tokens sounds good too.
>
> Regards,
>
> Rajini
>
>
> On Fri, Feb 3, 2017 at 8:04 PM, Jason Gustafson 
> wrote:
>
> > Hey Rajini,
> >
> > Thanks for the questions. Responses below:
> >
> >
> > > 1. Will the transaction coordinator check topic ACLs based on the
> > > requesting client's credentials? Access to transaction logs, topics
> being
> > > added for transaction etc?
> >
> >
> > Good question. I think it makes sense to check topic Write permission
> when
> > adding partitions to the transaction. I'll add this to the document.
> > Perhaps authorization to the transaction log itself, however, can be
> > assumed from having access to the ProducerTransactionalId resource? This
> > would be similar to how access to __consumer_offsets is assumed if the
> > client has access to the Group resource.
> >
> > 2. If I create a transactional produce request (by hand, not using the
> > > producer API) with a random PID (random, hence unlikely to be in use),
> > will
> > > the broker append a transactional message to the logs, preventing LSO
> > from
> > > moving forward? What validation will broker do for PIDs?
> >
> >
> > Yes, that is correct. Validation of the TransactionalId to PID binding
> is a
> > known gap in the current proposal, and is discussed in the design
> document.
> > Now that I'm thinking about it a bit more, I think there is a good case
> for
> > including the TransactionalId in the ProduceRequest (I think Jun
> suggested
> > this previously). Verifying it does not ensure that the included PID is
> > correct, but it does ensure that the client is authorized to use
> > transactions. If the client wanted to do an "endless transaction attack,"
> > having Write access to the topic and an authorized transactionalID is all
> > they would need anyway even if we could authorize the PID itself. This
> > seems like a worthwhile improvement.
> >
> > For future work, my half-baked idea to authorize the PID binding is to
> > leverage the delegation work in KIP-48. When the PID is generated, we can
> > give the producer a token which is then used in produce requests (say an
> > hmac covering the TransactionalId and PID).
> >
> >
> > > 3. Will every broker check that a client sending transactional produce
> > > requests at least has write access to transaction log topic since it is
> > not
> > > validating transactional.id (for every produce request)?
> >
> >  4. I understand that brokers cannot authorize the transactional id for
> > each
> > > produce request since requests contain only the PID. But since there
> is a
> > > one-to-one mapping between PID and transactional.id, and a connection
> is
> > > never expected to change its transactional.id, perhaps it is feasible
> to
> > > add authorization and cache the results in the Session? Perhaps not for
> > > version 1, but feels like it will be good to close the security gap
> here.
> > > Obviously it would be simpler if transactional.id was in the produce
> > > request if the overhead was acceptable.
> >
> >
> > I think my response above addresses both of these. We should include the
> > TransactionalId in the ProduceRequest. Of course it need not be included
> in
> > the message format, so I'm not too concerned about the additional
> overhead
> > it adds.
> >
> > Thanks,
> > Jason
> >
> >
> > On Fri, Feb 3, 2017 at 6:52 AM, Ismael Juma  wrote:
> >
> > > Comments inline.
> > >
> > > On Thu, Feb 2, 2017 at 6:28 PM, Jason Gustafson 
> > > wrote:
> > >
> > > > Took me a while to remember why we didn't do this. The timestamp that
> > is
> > > > included at the message set level is the max timestamp of all
> messages
> > in
> > > > the message set as is the case in the current message format (I will
> > > update
> > > > the document to make this explicit). We could make the message
> > timestamps
> > > > relative to the max timestamp, but that makes serialization a bit
> > awkward
> > > > since the timestamps are not assumed to be increasing sequentially or
> > > > monotonically. Once the messages in the message set had been
> > determined,
> > > we
> > > > would need to go back and adjust the relative timestamps.
> > > >
> > >
> > > Yes, I thought this would be a bit tricky and hence why I mentioned the
> > > option of adding a new field at the message set level for the first
> > > 

[jira] [Updated] (KAFKA-3856) Move inner classes accessible only functions in TopologyBuilder out of public APIs

2017-02-03 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-3856:
---
Description: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API
  (was: In {{TopologyBuilder}} there are a couple of public functions that are 
actually only used in the internal classes such as StreamThread and 
StreamPartitionAssignor, and some accessible only in high-level DSL inner 
classes, examples include {{addInternalTopic}}, {{sourceGroups}} and 
{{copartitionGroups}}, etc. But they are still listed in Javadocs since this 
class is part of public APIs.

We should think about moving them out of the public functions. Unfortunately 
there is no "friend" access mode as in C++, so we need to think of another way.)

> Move inner classes accessible only functions in TopologyBuilder out of public 
> APIs
> --
>
> Key: KAFKA-3856
> URL: https://issues.apache.org/jira/browse/KAFKA-3856
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Matthias J. Sax
>  Labels: api, kip
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-3856) Cleanup Kafka Streams builder API

2017-02-03 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-3856:
---
Summary: Cleanup Kafka Streams builder API  (was: Move inner classes 
accessible only functions in TopologyBuilder out of public APIs)

> Cleanup Kafka Streams builder API
> -
>
> Key: KAFKA-3856
> URL: https://issues.apache.org/jira/browse/KAFKA-3856
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Matthias J. Sax
>  Labels: api, kip
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: kafka-trunk-jdk8 #1249

2017-02-03 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-3452 Follow-up: Optimize ByteStore Scenarios

[jason] HOTFIX: Verifiable producer request timeout needs conversion to

--
[...truncated 968 lines...]

kafka.server.ServerGenerateBrokerIdTest > testAutoGenerateBrokerId STARTED

kafka.server.ServerGenerateBrokerIdTest > testAutoGenerateBrokerId PASSED

kafka.server.ServerGenerateBrokerIdTest > testMultipleLogDirsMetaProps STARTED

kafka.server.ServerGenerateBrokerIdTest > testMultipleLogDirsMetaProps PASSED

kafka.server.ServerGenerateBrokerIdTest > testDisableGeneratedBrokerId STARTED

kafka.server.ServerGenerateBrokerIdTest > testDisableGeneratedBrokerId PASSED

kafka.server.ServerGenerateBrokerIdTest > testUserConfigAndGeneratedBrokerId 
STARTED

kafka.server.ServerGenerateBrokerIdTest > testUserConfigAndGeneratedBrokerId 
PASSED

kafka.server.ServerGenerateBrokerIdTest > 
testConsistentBrokerIdFromUserConfigAndMetaProps STARTED

kafka.server.ServerGenerateBrokerIdTest > 
testConsistentBrokerIdFromUserConfigAndMetaProps PASSED

kafka.server.OffsetCommitTest > testUpdateOffsets STARTED

kafka.server.OffsetCommitTest > testUpdateOffsets PASSED

kafka.server.OffsetCommitTest > testLargeMetadataPayload STARTED

kafka.server.OffsetCommitTest > testLargeMetadataPayload PASSED

kafka.server.OffsetCommitTest > testOffsetsDeleteAfterTopicDeletion STARTED

kafka.server.OffsetCommitTest > testOffsetsDeleteAfterTopicDeletion PASSED

kafka.server.OffsetCommitTest > testCommitAndFetchOffsets STARTED

kafka.server.OffsetCommitTest > testCommitAndFetchOffsets PASSED

kafka.server.OffsetCommitTest > testOffsetExpiration STARTED

kafka.server.OffsetCommitTest > testOffsetExpiration PASSED

kafka.server.OffsetCommitTest > testNonExistingTopicOffsetCommit STARTED

kafka.server.OffsetCommitTest > testNonExistingTopicOffsetCommit PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testTwoConsumersWithDifferentSaslCredentials STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testTwoConsumersWithDifferentSaslCredentials PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithoutDescribeAclViaSubscribe STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithoutDescribeAclViaSubscribe PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testProduceConsumeViaAssign 
STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testProduceConsumeViaAssign 
PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithDescribeAclViaAssign STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithDescribeAclViaAssign PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithDescribeAclViaSubscribe STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithDescribeAclViaSubscribe PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithoutDescribeAclViaAssign STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithoutDescribeAclViaAssign PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testNoGroupAcl STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testNoGroupAcl PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testNoProduceWithDescribeAcl 
STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > testNoProduceWithDescribeAcl 
PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testProduceConsumeViaSubscribe STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testProduceConsumeViaSubscribe PASSED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testNoProduceWithoutDescribeAcl STARTED

kafka.api.SaslPlainSslEndToEndAuthorizationTest > 
testNoProduceWithoutDescribeAcl PASSED

kafka.api.SslEndToEndAuthorizationTest > 
testNoConsumeWithoutDescribeAclViaSubscribe STARTED

kafka.api.SslEndToEndAuthorizationTest > 
testNoConsumeWithoutDescribeAclViaSubscribe PASSED

kafka.api.SslEndToEndAuthorizationTest > testProduceConsumeViaAssign STARTED

kafka.api.SslEndToEndAuthorizationTest > testProduceConsumeViaAssign PASSED

kafka.api.SslEndToEndAuthorizationTest > testNoConsumeWithDescribeAclViaAssign 
STARTED

kafka.api.SslEndToEndAuthorizationTest > testNoConsumeWithDescribeAclViaAssign 
PASSED

kafka.api.SslEndToEndAuthorizationTest > 
testNoConsumeWithDescribeAclViaSubscribe STARTED

kafka.api.SslEndToEndAuthorizationTest > 
testNoConsumeWithDescribeAclViaSubscribe PASSED

kafka.api.SslEndToEndAuthorizationTest > 
testNoConsumeWithoutDescribeAclViaAssign STARTED

kafka.api.SslEndToEndAuthorizationTest > 
testNoConsumeWithoutDescribeAclViaAssign PASSED

kafka.api.SslEndToEndAuthorizationTest > testNoGroupAcl STARTED

kafka.api.SslEndToEndAuthorizationTest > testNoGroupAcl PASSED

kafka.api.SslEndToEndAuthorizationTest > testNoProduceWithDescribeAcl STARTED

kafka.api.SslEndToEndAuthorizationTest > testNoProduceWithDescribeAcl PASSED


[GitHub] kafka-site issue #44: 0102 fixes

2017-02-03 Thread derrickdoo
Github user derrickdoo commented on the issue:

https://github.com/apache/kafka-site/pull/44
  
Here's the corresponded PR to get these changes permanently into the main 
Kafka repo for future releases https://github.com/apache/kafka/pull/2495


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time

2017-02-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852189#comment-15852189
 ] 

ASF GitHub Bot commented on KAFKA-4725:
---

GitHub user halorgium opened a pull request:

https://github.com/apache/kafka/pull/2496

KAFKA-4725: Stop leaking messages in produce request body when requests are 
delayed

This change is in response to 
[KAFKA-4725](https://issues.apache.org/jira/browse/KAFKA-4725). 

When a produce request is received, if the user/client is exceeding their 
produce quota, the response will be delayed until the quota is refilled 
appropriately. 

Unfortunately, the request body is still referenced in the callback which 
in turn leaks the messages contained within the request. 

This change allows the `KafkaApis` method to take ownership of the request 
body from the `RequestChannel.Request` object. 

I am not sure whether this breaks other invariants which are assumed within 
other parts of Kafka. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heroku/kafka fix-throttled-response-leak

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2496.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2496


commit ddb0541b156db546fbf6e065670fb25d6e4baba2
Author: Tim Carey-Smith 
Date:   2017-02-01T23:18:43Z

Stop leaking produce request in throttled requests

Further isolate the request from the callbacks

Remove pointless changes

Move body ownership logic into RequestChannel




> Kafka broker fails due to OOM when producer exceeds throttling quota for 
> extended periods of time
> -
>
> Key: KAFKA-4725
> URL: https://issues.apache.org/jira/browse/KAFKA-4725
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
> Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8
>Reporter: Jeff Chao
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.3.0, 0.10.2.1
>
> Attachments: oom-references.png
>
>
> Steps to Reproduce:
> 1. Create a non-compacted topic with 1 partition
> 2. Set a produce quota of 512 KB/s
> 3. Send messages at 20 MB/s
> 4. Observe heap memory growth as time progresses
> Investigation:
> While running performance tests with a user configured with a produce quota, 
> we found that the lead broker serving the requests would exhaust heap memory 
> if the producer sustained a inbound request throughput greater than the 
> produce quota. 
> Upon further investigation, we took a heap dump from that broker process and 
> discovered the ThrottledResponse object has a indirect reference to the 
> byte[] holding the messages associated with the ProduceRequest. 
> We're happy contributing a patch but in the meantime wanted to first raise 
> the issue and get feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2496: KAFKA-4725: Stop leaking messages in produce reque...

2017-02-03 Thread halorgium
GitHub user halorgium opened a pull request:

https://github.com/apache/kafka/pull/2496

KAFKA-4725: Stop leaking messages in produce request body when requests are 
delayed

This change is in response to 
[KAFKA-4725](https://issues.apache.org/jira/browse/KAFKA-4725). 

When a produce request is received, if the user/client is exceeding their 
produce quota, the response will be delayed until the quota is refilled 
appropriately. 

Unfortunately, the request body is still referenced in the callback which 
in turn leaks the messages contained within the request. 

This change allows the `KafkaApis` method to take ownership of the request 
body from the `RequestChannel.Request` object. 

I am not sure whether this breaks other invariants which are assumed within 
other parts of Kafka. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heroku/kafka fix-throttled-response-leak

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2496.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2496


commit ddb0541b156db546fbf6e065670fb25d6e4baba2
Author: Tim Carey-Smith 
Date:   2017-02-01T23:18:43Z

Stop leaking produce request in throttled requests

Further isolate the request from the callbacks

Remove pointless changes

Move body ownership logic into RequestChannel




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2495: Adjustments to docs to make them work with latest ...

2017-02-03 Thread derrickdoo
GitHub user derrickdoo opened a pull request:

https://github.com/apache/kafka/pull/2495

Adjustments to docs to make them work with latest template

- Change introduction.html to be a partial like all the other doc content 
sections
- Fix typo with include statement for the footer of docs
- Stop embedding the full Streams documentation inside of 
documentation.html now that we're treating it as a separate section.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/derrickdoo/kafka docUpdates

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2495.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2495


commit e75b2796fb586051e47d75c6c8f26c5dcd266e66
Author: Derrick Or 
Date:   2017-02-03T21:54:32Z

adjustments to docs to make them work with latest template




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-03 Thread Rajini Sivaram
+1 (non-binding)

(with additional authorization from Jason's note in the discussion thread)


On Fri, Feb 3, 2017 at 1:10 AM, Apurva Mehta  wrote:

> The wiki has been updated with a section on authorization, as well a
> summary of the message format changes.
>
> On Thu, Feb 2, 2017 at 9:38 AM, Jason Gustafson 
> wrote:
>
> > Thanks Tom, we'll update the wiki to reflect all the movement on the
> design
> > document. Did you have a specific concern with the new ACLs?
> >
> > -Jason
> >
> > On Thu, Feb 2, 2017 at 6:49 AM, Ismael Juma  wrote:
> >
> > > Hi Tom,
> > >
> > > That is a good point. During the discussion, it was agreed that changes
> > to
> > > public interfaces (message format, protocol, ACLs, etc.) would be
> copied
> > to
> > > the wiki once the things had settled down, but it looks like that
> hasn't
> > > been done yet. I agree that it makes sense to do it before people vote
> on
> > > it.
> > >
> > > Ismael
> > >
> > > On Thu, Feb 2, 2017 at 2:42 PM, Tom Crayford 
> > wrote:
> > >
> > > > -1 (non-binding)
> > > >
> > > > I've been slow at keeping up with the KIP and the discussion thread.
> > This
> > > > is an exciting and quite complex new feature, which provides great
> new
> > > > functionality.
> > > >
> > > > There's a thing I noticed missing from the KIP that's present in the
> > > google
> > > > doc - the doc talks about ACLs for TransactionalId. If those are
> going
> > to
> > > > land with the KIP, I think they should be included in the KIP itself,
> > as
> > > > new ACLs are significant security changes.
> > > >
> > > > On Thu, Feb 2, 2017 at 10:04 AM, Eno Thereska <
> eno.there...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 (non-binding).
> > > > >
> > > > > Excellent work and discussions!
> > > > >
> > > > > Eno
> > > > > > On 2 Feb 2017, at 04:13, Guozhang Wang 
> wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We would like to start the voting process for KIP-98. The KIP can
> > be
> > > > > found
> > > > > > at
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 98+-+Exactly+Once+Delivery+and+Transactional+Messaging
> > > > > >
> > > > > > Discussion thread can be found here:
> > > > > >
> > > > > > http://search-hadoop.com/m/Kafka/uyzND1jwZrr7HRHf?subj=+
> > > > > DISCUSS+KIP+98+Exactly+Once+Delivery+and+Transactional+Messaging
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-03 Thread Rajini Sivaram
Hi Jason,

Thank you for the responses. Agree that authorizing transactional.id in the
producer requests will be good enough for version 1. And making it tighter
in future based on delegation tokens sounds good too.

Regards,

Rajini


On Fri, Feb 3, 2017 at 8:04 PM, Jason Gustafson  wrote:

> Hey Rajini,
>
> Thanks for the questions. Responses below:
>
>
> > 1. Will the transaction coordinator check topic ACLs based on the
> > requesting client's credentials? Access to transaction logs, topics being
> > added for transaction etc?
>
>
> Good question. I think it makes sense to check topic Write permission when
> adding partitions to the transaction. I'll add this to the document.
> Perhaps authorization to the transaction log itself, however, can be
> assumed from having access to the ProducerTransactionalId resource? This
> would be similar to how access to __consumer_offsets is assumed if the
> client has access to the Group resource.
>
> 2. If I create a transactional produce request (by hand, not using the
> > producer API) with a random PID (random, hence unlikely to be in use),
> will
> > the broker append a transactional message to the logs, preventing LSO
> from
> > moving forward? What validation will broker do for PIDs?
>
>
> Yes, that is correct. Validation of the TransactionalId to PID binding is a
> known gap in the current proposal, and is discussed in the design document.
> Now that I'm thinking about it a bit more, I think there is a good case for
> including the TransactionalId in the ProduceRequest (I think Jun suggested
> this previously). Verifying it does not ensure that the included PID is
> correct, but it does ensure that the client is authorized to use
> transactions. If the client wanted to do an "endless transaction attack,"
> having Write access to the topic and an authorized transactionalID is all
> they would need anyway even if we could authorize the PID itself. This
> seems like a worthwhile improvement.
>
> For future work, my half-baked idea to authorize the PID binding is to
> leverage the delegation work in KIP-48. When the PID is generated, we can
> give the producer a token which is then used in produce requests (say an
> hmac covering the TransactionalId and PID).
>
>
> > 3. Will every broker check that a client sending transactional produce
> > requests at least has write access to transaction log topic since it is
> not
> > validating transactional.id (for every produce request)?
>
>  4. I understand that brokers cannot authorize the transactional id for
> each
> > produce request since requests contain only the PID. But since there is a
> > one-to-one mapping between PID and transactional.id, and a connection is
> > never expected to change its transactional.id, perhaps it is feasible to
> > add authorization and cache the results in the Session? Perhaps not for
> > version 1, but feels like it will be good to close the security gap here.
> > Obviously it would be simpler if transactional.id was in the produce
> > request if the overhead was acceptable.
>
>
> I think my response above addresses both of these. We should include the
> TransactionalId in the ProduceRequest. Of course it need not be included in
> the message format, so I'm not too concerned about the additional overhead
> it adds.
>
> Thanks,
> Jason
>
>
> On Fri, Feb 3, 2017 at 6:52 AM, Ismael Juma  wrote:
>
> > Comments inline.
> >
> > On Thu, Feb 2, 2017 at 6:28 PM, Jason Gustafson 
> > wrote:
> >
> > > Took me a while to remember why we didn't do this. The timestamp that
> is
> > > included at the message set level is the max timestamp of all messages
> in
> > > the message set as is the case in the current message format (I will
> > update
> > > the document to make this explicit). We could make the message
> timestamps
> > > relative to the max timestamp, but that makes serialization a bit
> awkward
> > > since the timestamps are not assumed to be increasing sequentially or
> > > monotonically. Once the messages in the message set had been
> determined,
> > we
> > > would need to go back and adjust the relative timestamps.
> > >
> >
> > Yes, I thought this would be a bit tricky and hence why I mentioned the
> > option of adding a new field at the message set level for the first
> > timestamp even though that's not ideal either.
> >
> > Here's one idea. We let the timestamps in the messages be varints, but we
> > > make their values be relative to the timestamp of the previous message,
> > > with the timestamp of the first message being absolute. For example, if
> > we
> > > had timestamps 500, 501, 499, then we would write 500 for the first
> > > message, 1 for the next, and -2 for the final message. Would that work?
> > Let
> > > me think a bit about it and see if there are any problems.
> > >
> >
> > It's an interesting idea. Comparing to the option of having the first
> > timestamp in the message set, It's a little more space 

Jenkins build is back to normal : kafka-trunk-jdk8 #1248

2017-02-03 Thread Apache Jenkins Server
See 



[GitHub] kafka pull request #2494: HOTFIX: Verifiable producer request timeout needs ...

2017-02-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2494


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka-site issue #44: 0102 fixes

2017-02-03 Thread derrickdoo
Github user derrickdoo commented on the issue:

https://github.com/apache/kafka-site/pull/44
  
@guozhangwang can you take a look at this and double check to make sure 
this fixes all the issues you were seeing. I'll make another PR into the main 
Kafka repo to make sure we don't run into items 2-4 again. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2376) Add Copycat metrics

2017-02-03 Thread Randall Hauch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852164#comment-15852164
 ] 

Randall Hauch commented on KAFKA-2376:
--

I agree with [~criccomini]. It's be great to be able for connectors to add 
their own metrics using the same framework. 

> Add Copycat metrics
> ---
>
> Key: KAFKA-2376
> URL: https://issues.apache.org/jira/browse/KAFKA-2376
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> Copycat needs good metrics for monitoring since that will be the primary 
> insight into the health of connectors as they copy data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2494: HOTFIX: Verifiable producer request timeout needs ...

2017-02-03 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/2494

HOTFIX: Verifiable producer request timeout needs conversion to milliseconds



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka 
hotfix-verifiable-producer-request-timeout

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2494.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2494


commit b95f91e08c1dd691c9ff7f14448ebda388d6727f
Author: Jason Gustafson 
Date:   2017-02-03T21:37:08Z

HOTFIX: Verifiable producer request timeout needs conversion to milliseconds




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4720) Add KStream.peek(ForeachAction<K,V>)

2017-02-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852102#comment-15852102
 ] 

ASF GitHub Bot commented on KAFKA-4720:
---

GitHub user stevenschlansker opened a pull request:

https://github.com/apache/kafka/pull/2493

KAFKA-4720: add a KStream#peek(ForeachAction)

https://issues.apache.org/jira/browse/KAFKA-4720

Peek is a handy method to have to insert diagnostics that do not affect the 
stream itself, but some external state such as logging or metrics collection.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/stevenschlansker/kafka kafka-4720-peek

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2493.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2493


commit 3f06ee010028335674f1a9e21c0fa740e3d5a950
Author: Steven Schlansker 
Date:   2017-02-03T20:46:43Z

KAFKA-4720: add a KStream#peek(ForeachAction)




> Add KStream.peek(ForeachAction)
> 
>
> Key: KAFKA-4720
> URL: https://issues.apache.org/jira/browse/KAFKA-4720
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 0.10.1.1
>Reporter: Steven Schlansker
>
> Java's Stream provides a handy peek method that observes elements in the 
> stream without transforming or filtering them.  While you can emulate this 
> functionality with either a filter or map, peek provides potentially useful 
> semantic information (doesn't modify the stream) and is much more concise.
> Example usage: using Dropwizard Metrics to provide event counters
> {code}
> KStream s = ...;
> s.map(this::mungeData)
>  .peek((i, s) -> metrics.noteMungedEvent(i, s))
>  .filter(this::hadProcessingError)
>  .print();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2493: KAFKA-4720: add a KStream#peek(ForeachAction<K, V>...

2017-02-03 Thread stevenschlansker
GitHub user stevenschlansker opened a pull request:

https://github.com/apache/kafka/pull/2493

KAFKA-4720: add a KStream#peek(ForeachAction)

https://issues.apache.org/jira/browse/KAFKA-4720

Peek is a handy method to have to insert diagnostics that do not affect the 
stream itself, but some external state such as logging or metrics collection.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/stevenschlansker/kafka kafka-4720-peek

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2493.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2493


commit 3f06ee010028335674f1a9e21c0fa740e3d5a950
Author: Steven Schlansker 
Date:   2017-02-03T20:46:43Z

KAFKA-4720: add a KStream#peek(ForeachAction)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-4732) Unstable test: KStreamKTableJoinIntegrationTest.shouldCountClicksPerRegion[1]

2017-02-03 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-4732:
--

 Summary: Unstable test: 
KStreamKTableJoinIntegrationTest.shouldCountClicksPerRegion[1]
 Key: KAFKA-4732
 URL: https://issues.apache.org/jira/browse/KAFKA-4732
 Project: Kafka
  Issue Type: Bug
  Components: streams, unit tests
Reporter: Matthias J. Sax


{noformat}
java.lang.AssertionError: Condition not met within timeout 3. Expecting 3 
records from topic output-topic-2 while only received 0: []
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:259)
at 
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:221)
at 
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived(IntegrationTestUtils.java:190)
at 
org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest.shouldCountClicksPerRegion(KStreamKTableJoinIntegrationTest.java:295)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-03 Thread Jason Gustafson
Hey Rajini,

Thanks for the questions. Responses below:


> 1. Will the transaction coordinator check topic ACLs based on the
> requesting client's credentials? Access to transaction logs, topics being
> added for transaction etc?


Good question. I think it makes sense to check topic Write permission when
adding partitions to the transaction. I'll add this to the document.
Perhaps authorization to the transaction log itself, however, can be
assumed from having access to the ProducerTransactionalId resource? This
would be similar to how access to __consumer_offsets is assumed if the
client has access to the Group resource.

2. If I create a transactional produce request (by hand, not using the
> producer API) with a random PID (random, hence unlikely to be in use), will
> the broker append a transactional message to the logs, preventing LSO from
> moving forward? What validation will broker do for PIDs?


Yes, that is correct. Validation of the TransactionalId to PID binding is a
known gap in the current proposal, and is discussed in the design document.
Now that I'm thinking about it a bit more, I think there is a good case for
including the TransactionalId in the ProduceRequest (I think Jun suggested
this previously). Verifying it does not ensure that the included PID is
correct, but it does ensure that the client is authorized to use
transactions. If the client wanted to do an "endless transaction attack,"
having Write access to the topic and an authorized transactionalID is all
they would need anyway even if we could authorize the PID itself. This
seems like a worthwhile improvement.

For future work, my half-baked idea to authorize the PID binding is to
leverage the delegation work in KIP-48. When the PID is generated, we can
give the producer a token which is then used in produce requests (say an
hmac covering the TransactionalId and PID).


> 3. Will every broker check that a client sending transactional produce
> requests at least has write access to transaction log topic since it is not
> validating transactional.id (for every produce request)?

 4. I understand that brokers cannot authorize the transactional id for each
> produce request since requests contain only the PID. But since there is a
> one-to-one mapping between PID and transactional.id, and a connection is
> never expected to change its transactional.id, perhaps it is feasible to
> add authorization and cache the results in the Session? Perhaps not for
> version 1, but feels like it will be good to close the security gap here.
> Obviously it would be simpler if transactional.id was in the produce
> request if the overhead was acceptable.


I think my response above addresses both of these. We should include the
TransactionalId in the ProduceRequest. Of course it need not be included in
the message format, so I'm not too concerned about the additional overhead
it adds.

Thanks,
Jason


On Fri, Feb 3, 2017 at 6:52 AM, Ismael Juma  wrote:

> Comments inline.
>
> On Thu, Feb 2, 2017 at 6:28 PM, Jason Gustafson 
> wrote:
>
> > Took me a while to remember why we didn't do this. The timestamp that is
> > included at the message set level is the max timestamp of all messages in
> > the message set as is the case in the current message format (I will
> update
> > the document to make this explicit). We could make the message timestamps
> > relative to the max timestamp, but that makes serialization a bit awkward
> > since the timestamps are not assumed to be increasing sequentially or
> > monotonically. Once the messages in the message set had been determined,
> we
> > would need to go back and adjust the relative timestamps.
> >
>
> Yes, I thought this would be a bit tricky and hence why I mentioned the
> option of adding a new field at the message set level for the first
> timestamp even though that's not ideal either.
>
> Here's one idea. We let the timestamps in the messages be varints, but we
> > make their values be relative to the timestamp of the previous message,
> > with the timestamp of the first message being absolute. For example, if
> we
> > had timestamps 500, 501, 499, then we would write 500 for the first
> > message, 1 for the next, and -2 for the final message. Would that work?
> Let
> > me think a bit about it and see if there are any problems.
> >
>
> It's an interesting idea. Comparing to the option of having the first
> timestamp in the message set, It's a little more space efficient as we
> don't have both a full timestamp in the message set _and_ a varint in the
> first message (which would always be 0, so we avoid the extra byte) and
> also the deltas could be a little smaller in the common case. The main
> downside is that it introduces a semantics inconsistency between the first
> message and the rest. Not ideal, but maybe we can live with that.
>
> Ismael
>


[VOTE] KIP-111 Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-03 Thread Mayuresh Gharat
Hi All,

It seems that there is no further concern with the KIP-111. At this point
we would like to start the voting process. The KIP can be found at
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67638388

Thanks,

Mayuresh


Re: [VOTE] KIP-54: Sticky Partition Assignment Strategy

2017-02-03 Thread Vahid S Hashemian
The voting on this KIP concludes with 3 binding and 3 non-binding +1 
votes.
Thanks to everyone who provided feedback / voted and to Jeff Widman for 
reviving the voting thread.

Binding votes by:
* Jason Gustafson
* Ewen Cheslack-Postava
* Guozhang Wang

Non-binding votes by:
* Rajini Sivaram
* Bill Bejeck
* Mickael Maison
 
--Vahid




From:   Guozhang Wang 
To: "dev@kafka.apache.org" 
Date:   02/02/2017 11:26 PM
Subject:Re: [VOTE] KIP-54: Sticky Partition Assignment Strategy



+1 (binding).

Cheers.

On Thu, Feb 2, 2017 at 10:35 PM, Ewen Cheslack-Postava 
wrote:

> +1
>
> I don't think this solves all the stickiness/incremental rebalancing
> problems we'll eventually want to address, but it's a nice improvement,
> would be a benefit for a fair number of applications, and as it's a 
clean
> extension to the existing options it doesn't come with any significant
> compatibility concerns.
>
> (Also, this should bump this thread, which Jeff Widman was wondering 
about.
> It's lacking at least 1 more binding vote before it could pass.)
>
> -Ewen
>
> On Thu, Sep 22, 2016 at 1:43 AM, Mickael Maison 

> wrote:
>
> > +1 (non-binding)
> >
> > On Thu, Sep 15, 2016 at 8:32 PM, Bill Bejeck  
wrote:
> > > +1
> > >
> > > On Thu, Sep 15, 2016 at 5:16 AM, Rajini Sivaram <
> > > rajinisiva...@googlemail.com> wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> On Wed, Sep 14, 2016 at 12:37 AM, Jason Gustafson 
 >
> > >> wrote:
> > >>
> > >> > Thanks for the KIP. +1 from me.
> > >> >
> > >> > On Tue, Sep 13, 2016 at 12:05 PM, Vahid S Hashemian <
> > >> > vahidhashem...@us.ibm.com> wrote:
> > >> >
> > >> > > Hi all,
> > >> > >
> > >> > > Thanks for providing feedback on this KIP so far.
> > >> > > The KIP was discussed during the KIP meeting today and there
> doesn't
> > >> seem
> > >> > > to be any unaddressed issue at this point.
> > >> > >
> > >> > > So I would like to initiate the voting process.
> > >> > >
> > >> > > The KIP can be found here:
> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > > 54+-+Sticky+Partition+Assignment+Strategy
> > >> > > And the full discussion thread is here:
> > >> > > https://www.mail-archive.com/dev@kafka.apache.org/msg47607.html
> > >> > >
> > >> > > Thanks.
> > >> > > --Vahid
> > >> > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Regards,
> > >>
> > >> Rajini
> > >>
> >
>



-- 
-- Guozhang






[jira] [Commented] (KAFKA-3452) Support session windows

2017-02-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851970#comment-15851970
 ] 

ASF GitHub Bot commented on KAFKA-3452:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2333


> Support session windows
> ---
>
> Key: KAFKA-3452
> URL: https://issues.apache.org/jira/browse/KAFKA-3452
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: api, kip
> Fix For: 0.10.2.0
>
>
> The Streams DSL currently does not provide session window as in the DataFlow 
> model. We have seen some common use cases for this feature and it's better 
> adding this support asap.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-94+Session+Windows



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2333: KAFKA-3452 Follow-up: Optimize ByteStore Scenarios

2017-02-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2333


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2376) Add Copycat metrics

2017-02-03 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851926#comment-15851926
 ] 

Chris Riccomini commented on KAFKA-2376:


One thing we'd like to see here is the ability to hook into the Kafka Connect 
metrics from our connectors. That way we can add connector-specific metrics, 
and have them flow through the same pipeline.

> Add Copycat metrics
> ---
>
> Key: KAFKA-2376
> URL: https://issues.apache.org/jira/browse/KAFKA-2376
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> Copycat needs good metrics for monitoring since that will be the primary 
> insight into the health of connectors as they copy data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: kafka-trunk-jdk8 #1247

2017-02-03 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-4662: adding test coverage for addSource methods with

[wangguoz] KAFKA-4649: Improve test coverage GlobalStateManagerImpl

--
[...truncated 15916 lines...]
org.apache.kafka.clients.NodeApiVersionsTest > 
testUsableVersionCalculationNoKnownVersions PASSED

org.apache.kafka.clients.NodeApiVersionsTest > testVersionsToString STARTED

org.apache.kafka.clients.NodeApiVersionsTest > testVersionsToString PASSED

org.apache.kafka.clients.NodeApiVersionsTest > testUnsupportedVersionsToString 
STARTED

org.apache.kafka.clients.NodeApiVersionsTest > testUnsupportedVersionsToString 
PASSED

org.apache.kafka.clients.NodeApiVersionsTest > testUsableVersionCalculation 
STARTED

org.apache.kafka.clients.NodeApiVersionsTest > testUsableVersionCalculation 
PASSED

org.apache.kafka.clients.NodeApiVersionsTest > testUsableVersionLatestVersions 
STARTED

org.apache.kafka.clients.NodeApiVersionsTest > testUsableVersionLatestVersions 
PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testRetryBackoff STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testRetryBackoff PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testNextReadyCheckDelay STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testNextReadyCheckDelay PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testStressfulSituation STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testStressfulSituation PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testAwaitFlushComplete STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testAwaitFlushComplete PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > testFlush 
STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > testFlush 
PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > testFull 
STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > testFull 
PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testAppendInExpiryCallback STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testAppendInExpiryCallback PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testAbortIncompleteBatches STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testAbortIncompleteBatches PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testMutedPartitions STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testMutedPartitions PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testExpiredBatches STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testExpiredBatches PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > testLinger 
STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > testLinger 
PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testPartialDrain STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testPartialDrain PASSED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testAppendLarge STARTED

org.apache.kafka.clients.producer.internals.RecordAccumulatorTest > 
testAppendLarge PASSED

org.apache.kafka.clients.producer.internals.SenderTest > 
testMetadataTopicExpiry STARTED

org.apache.kafka.clients.producer.internals.SenderTest > 
testMetadataTopicExpiry PASSED

org.apache.kafka.clients.producer.internals.SenderTest > testQuotaMetrics 
STARTED

org.apache.kafka.clients.producer.internals.SenderTest > testQuotaMetrics PASSED

org.apache.kafka.clients.producer.internals.SenderTest > testRetries STARTED

org.apache.kafka.clients.producer.internals.SenderTest > testRetries PASSED

org.apache.kafka.clients.producer.internals.SenderTest > testSendInOrder STARTED

org.apache.kafka.clients.producer.internals.SenderTest > testSendInOrder PASSED

org.apache.kafka.clients.producer.internals.SenderTest > testSimple STARTED

org.apache.kafka.clients.producer.internals.SenderTest > testSimple PASSED

org.apache.kafka.clients.producer.internals.DefaultPartitionerTest > 
testKeyPartitionIsStable STARTED

org.apache.kafka.clients.producer.internals.DefaultPartitionerTest > 
testKeyPartitionIsStable PASSED

org.apache.kafka.clients.producer.internals.DefaultPartitionerTest > 
testRoundRobin STARTED

org.apache.kafka.clients.producer.internals.DefaultPartitionerTest > 
testRoundRobin PASSED

org.apache.kafka.clients.producer.internals.DefaultPartitionerTest > 
testRoundRobinWithUnavailablePartitions STARTED

org.apache.kafka.clients.producer.internals.DefaultPartitionerTest > 

[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex

2017-02-03 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851905#comment-15851905
 ] 

Ismael Juma commented on KAFKA-4614:


Thanks [~thetaphi]. We are aware of the issue and we are tracking it in 
KAFKA-4501. You will notice that we have a link to the Lucene fix there 
already. :)

> Long GC pause harming broker performance which is caused by mmap objects 
> created for OffsetIndex
> 
>
> Key: KAFKA-4614
> URL: https://issues.apache.org/jira/browse/KAFKA-4614
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>  Labels: latency, performance
> Fix For: 0.10.2.0
>
> Attachments: kafka-produce-99th.png
>
>
> First let me clarify our system environment information as I think it's 
> important to understand this issue:
> OS: CentOS6
> Kernel version: 2.6.32-XX
> Filesystem used for data volume: XFS
> Java version: 1.8.0_66
> GC option: Kafka default(G1GC)
> Kafka version: 0.10.0.1
> h2. Phenomenon
> In our Kafka cluster, an usual response time for Produce request is about 1ms 
> for 50th percentile to 10ms for 99th percentile. All topics are configured to 
> have 3 replicas and all producers are configured {{acks=all}} so this time 
> includes replication latency.
> Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms 
> but for the most cases the time consuming part is "Remote" which means it is 
> caused by slow replication which is known to happen by various reasons(which 
> is also an issue that we're trying to improve, but out of interest within 
> this issue).
> However, we found that there are some different patterns which happens rarely 
> but stationary 3 ~ 5 times a day for each servers. The interesting part is 
> that "RequestQueue" also got increased as well as "Total" and "Remote".
> At the same time, we observed that the disk read metrics(in terms of both 
> read bytes and read time) spikes exactly for the same moment. Currently we 
> have only caught up consumers so this metric sticks to zero while all 
> requests are served by page cache.
> In order to investigate what Kafka is "read"ing, I employed SystemTap and 
> wrote the following script. It traces all disk reads(only actual read by 
> physical device) made for the data volume by broker process.
> {code}
> global target_pid = KAFKA_PID
> global target_dev = DATA_VOLUME
> probe ioblock.request {
>   if (rw == BIO_READ && pid() == target_pid && devname == target_dev) {
>  t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment
>  printf("%s,%03d:  tid = %d, device = %s, inode = %d, size = %d\n", 
> ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size)
>  print_backtrace()
>  print_ubacktrace()
>   }
> }
> {code}
> As the result, we could observe many logs like below:
> {code}
> Thu Dec 22 17:21:39 2016,209:  tid = 126123, device = sdb1, inode = -1, size 
> = 4096
>  0x81275050 : generic_make_request+0x0/0x5a0 [kernel]
>  0x81275660 : submit_bio+0x70/0x120 [kernel]
>  0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs]
>  0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs]
>  0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs]
>  0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs]
>  0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs]
>  0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs]
>  0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs]
>  0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs]
>  0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs]
>  0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs]
>  0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs]
>  0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs]
>  0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs]
>  0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs]
>  0x811affbc : clear_inode+0xac/0x140 [kernel]
>  0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel]
>  0x811b0815 : generic_drop_inode+0x65/0x80 [kernel]
>  0x811af662 : iput+0x62/0x70 [kernel]
>  0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so]
>  0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 
> [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so]
>  0x7ff269a1307e
> {code}
> We took a jstack dump of the broker process and found that tid = 126123 
> corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)):
> {code}
> $ grep 0x1ecab /tmp/jstack.dump
> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 
> nid=0x1ecab in Object.wait() [0x7ff17da11000]
> {code}
> In order to confirm, we enabled 

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-02-03 Thread Colin McCabe
On Thu, Feb 2, 2017, at 15:02, Ismael Juma wrote:
> Hi Colin,
> 
> Thanks for the KIP, great to make progress on this. I have some initial
> comments, will post more later:
> 
> 1. We have KafkaProducer that implements the Producer interface and
> KafkaConsumer that implements the Consumer interface. Maybe we could have
> KafkaAdminClient that implements the AdminClient interface? Or maybe just
> KafkaAdmin. Not sure, some ideas for consideration. Also, I don't think
> we
> should worry about a name clash with the internal AdminClient written in
> Scala. That will go away soon enough and choosing a good name for the
> public class is what matters.

Hi Ismael,

Thanks for taking a look.

I guess my thought process was that users might find it confusing if the
public API and the old private API had the same name.  "What do you
mean, I have to upgrade to release X to get AdminClient, I have it right
here?"  I do have a slight preference for the shorter name, though, so
if this isn't a worry, we can change it to AdminClient.

> 
> 2. We should include the proposed package name in the KIP
> (probably org.apache.kafka.clients.admin?).

Good idea.  I will add the package name to the KIP.

> 
> 3. It would be good to list the supported configs.

OK

> 
> 4. KIP-107, which passed the vote, specifies the introduction of a method
> to AdminClient with the following signature. We should figure out how it
> would look given this proposal.
> 
> Future>
> purgeDataBefore(Map offsetForPartition)
> 
> 5. I am not sure about rejecting the Futures-based API. I think I would
> prefer that, personally. Grant had an interesting idea of not exposing
> the
> batch methods in the AdminClient to start with to keep it simple and
> relying on a Future based API to make it easy for users to run things
> concurrently. This is consistent with the producer... 

So, there are two ways that an operation can be "async" here which are
very separate.

There is "async on the server."  This basically means that we tell the
server to do something and don't wait for a confirmation that it
succeeded.  For example, in the current proposal, users can call
createTopic(new Topic(...), CreateTopicFlags.NONBLOCKING).  The call
will wait for the server to get the request, which will go into
purgatory.  Later on, the request may succeed or fail, but the client
won't know either way.  In RPC terms, this means we set the timeout
value to 0.

Then there is "async on the client."  This just means that the client
thread doesn't block-- instead, it gets back a Future (or similar
object).  What this boils down to in terms of implementation is that a
message gets put on some queue somewhere and the client thread continues
running.

"async on the client" tends to be good when you want to churn out a ton
of requests without using lots of threads.  However, it is more
confusing mental model for most programmers.

You can easily translate a Futures-based API into a blocking API by
having blocking shims that just call create the Future and call get(). 
Similarly, you can transform a blocking API into a Futures-based API by
using a thread pool.  Thread pools use resources, though, whereas having
function shims does not.

I haven't seen any discussion here about what we gain here by using a
Futures-based API.  It makes sense to use Futures in the Producer, since
they're more flexible, and users are potentially creating lots and lots
of messages.  I'm not sure if users would want to do lots and lots of
admin operations with a single thread.  I'd be curious to hear a little
more from potential end-users about what API would be most flexible and
usable for them.  I'm open to ideas.

best,
Colin

> 
> Ismael
> 
> On Thu, Feb 2, 2017 at 6:54 PM, Colin McCabe  wrote:
> 
> > Hi all,
> >
> > I wrote up a Kafka improvement proposal for adding an
> > AdministrativeClient interface.  This is a continuation of the work on
> > KIP-4 towards centralized administrative operations.  Please check out
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%3A+Add+a+public+
> > AdministrativeClient+API+for+Kafka+admin+operations
> >
> > regards,
> > Colin
> >


Re: [DISCUSS] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-03 Thread Eno Thereska
Makes sense.

Eno

> On 3 Feb 2017, at 10:38, Ismael Juma  wrote:
> 
> Hi all,
> 
> I have posted a KIP for dropping support for Java 7 in Kafka 0.11:
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-118%3A+Drop+Support+for+Java+7+in+Kafka+0.11
> 
> Most people were supportive when we last discussed the topic[1], but there
> were a few concerns. I believe the following should mitigate the concerns:
> 
> 1. The new proposal suggests dropping support in the next major version
> instead of the next minor version.
> 2. KIP-97 which is part of 0.10.2 means that 0.11 clients will support 0.10
> brokers (0.11 brokers will also support 0.10 clients as usual), so there is
> even more flexibility on incremental upgrades.
> 3. Java 9 will be released shortly after the next Kafka release, so we'll
> be supporting the 2 most recent Java releases, which is a reasonable policy.
> 4. 8 months have passed since the last proposal and the release after
> 0.10.2 won't be out for another 4 months, which should hopefully be enough
> time for Java 8 to be even more established. We haven't decided when the
> next major release will happen, but we know that it won't happen before
> June 2017.
> 
> Please take a look at the proposal and share your feedback.
> 
> Thanks,
> Ismael
> 
> [1] http://search-hadoop.com/m/Kafka/uyzND1oIhV61GS5Sf2



[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex

2017-02-03 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851904#comment-15851904
 ] 

Jun Rao commented on KAFKA-4614:


[~thetaphi], thanks for the info. Much appreciated!

> Long GC pause harming broker performance which is caused by mmap objects 
> created for OffsetIndex
> 
>
> Key: KAFKA-4614
> URL: https://issues.apache.org/jira/browse/KAFKA-4614
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>  Labels: latency, performance
> Fix For: 0.10.2.0
>
> Attachments: kafka-produce-99th.png
>
>
> First let me clarify our system environment information as I think it's 
> important to understand this issue:
> OS: CentOS6
> Kernel version: 2.6.32-XX
> Filesystem used for data volume: XFS
> Java version: 1.8.0_66
> GC option: Kafka default(G1GC)
> Kafka version: 0.10.0.1
> h2. Phenomenon
> In our Kafka cluster, an usual response time for Produce request is about 1ms 
> for 50th percentile to 10ms for 99th percentile. All topics are configured to 
> have 3 replicas and all producers are configured {{acks=all}} so this time 
> includes replication latency.
> Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms 
> but for the most cases the time consuming part is "Remote" which means it is 
> caused by slow replication which is known to happen by various reasons(which 
> is also an issue that we're trying to improve, but out of interest within 
> this issue).
> However, we found that there are some different patterns which happens rarely 
> but stationary 3 ~ 5 times a day for each servers. The interesting part is 
> that "RequestQueue" also got increased as well as "Total" and "Remote".
> At the same time, we observed that the disk read metrics(in terms of both 
> read bytes and read time) spikes exactly for the same moment. Currently we 
> have only caught up consumers so this metric sticks to zero while all 
> requests are served by page cache.
> In order to investigate what Kafka is "read"ing, I employed SystemTap and 
> wrote the following script. It traces all disk reads(only actual read by 
> physical device) made for the data volume by broker process.
> {code}
> global target_pid = KAFKA_PID
> global target_dev = DATA_VOLUME
> probe ioblock.request {
>   if (rw == BIO_READ && pid() == target_pid && devname == target_dev) {
>  t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment
>  printf("%s,%03d:  tid = %d, device = %s, inode = %d, size = %d\n", 
> ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size)
>  print_backtrace()
>  print_ubacktrace()
>   }
> }
> {code}
> As the result, we could observe many logs like below:
> {code}
> Thu Dec 22 17:21:39 2016,209:  tid = 126123, device = sdb1, inode = -1, size 
> = 4096
>  0x81275050 : generic_make_request+0x0/0x5a0 [kernel]
>  0x81275660 : submit_bio+0x70/0x120 [kernel]
>  0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs]
>  0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs]
>  0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs]
>  0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs]
>  0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs]
>  0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs]
>  0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs]
>  0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs]
>  0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs]
>  0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs]
>  0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs]
>  0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs]
>  0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs]
>  0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs]
>  0x811affbc : clear_inode+0xac/0x140 [kernel]
>  0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel]
>  0x811b0815 : generic_drop_inode+0x65/0x80 [kernel]
>  0x811af662 : iput+0x62/0x70 [kernel]
>  0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so]
>  0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 
> [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so]
>  0x7ff269a1307e
> {code}
> We took a jstack dump of the broker process and found that tid = 126123 
> corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)):
> {code}
> $ grep 0x1ecab /tmp/jstack.dump
> "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 
> nid=0x1ecab in Object.wait() [0x7ff17da11000]
> {code}
> In order to confirm, we enabled {{PrintGCApplicationStoppedTime}} switch and 
> confirmed that in total the time the broker paused is longer than usual, 

[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time

2017-02-03 Thread Jeff Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851898#comment-15851898
 ] 

Jeff Chao commented on KAFKA-4725:
--

Ok, we'll base it off trunk and open up a PR. Thanks.

> Kafka broker fails due to OOM when producer exceeds throttling quota for 
> extended periods of time
> -
>
> Key: KAFKA-4725
> URL: https://issues.apache.org/jira/browse/KAFKA-4725
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
> Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8
>Reporter: Jeff Chao
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.3.0, 0.10.2.1
>
> Attachments: oom-references.png
>
>
> Steps to Reproduce:
> 1. Create a non-compacted topic with 1 partition
> 2. Set a produce quota of 512 KB/s
> 3. Send messages at 20 MB/s
> 4. Observe heap memory growth as time progresses
> Investigation:
> While running performance tests with a user configured with a produce quota, 
> we found that the lead broker serving the requests would exhaust heap memory 
> if the producer sustained a inbound request throughput greater than the 
> produce quota. 
> Upon further investigation, we took a heap dump from that broker process and 
> discovered the ThrottledResponse object has a indirect reference to the 
> byte[] holding the messages associated with the ProduceRequest. 
> We're happy contributing a patch but in the meantime wanted to first raise 
> the issue and get feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time

2017-02-03 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851885#comment-15851885
 ] 

Ismael Juma commented on KAFKA-4725:


Great. The target should be trunk, we cherry-pick to other branches during the 
merge.

> Kafka broker fails due to OOM when producer exceeds throttling quota for 
> extended periods of time
> -
>
> Key: KAFKA-4725
> URL: https://issues.apache.org/jira/browse/KAFKA-4725
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
> Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8
>Reporter: Jeff Chao
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.3.0, 0.10.2.1
>
> Attachments: oom-references.png
>
>
> Steps to Reproduce:
> 1. Create a non-compacted topic with 1 partition
> 2. Set a produce quota of 512 KB/s
> 3. Send messages at 20 MB/s
> 4. Observe heap memory growth as time progresses
> Investigation:
> While running performance tests with a user configured with a produce quota, 
> we found that the lead broker serving the requests would exhaust heap memory 
> if the producer sustained a inbound request throughput greater than the 
> produce quota. 
> Upon further investigation, we took a heap dump from that broker process and 
> discovered the ThrottledResponse object has a indirect reference to the 
> byte[] holding the messages associated with the ProduceRequest. 
> We're happy contributing a patch but in the meantime wanted to first raise 
> the issue and get feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time

2017-02-03 Thread Tim Carey-Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851880#comment-15851880
 ] 

Tim Carey-Smith commented on KAFKA-4725:


Hi there, 

Jeff and I have prototyped a fix for this bug. We repeated our stress tests 
against a new build and have not yet been able to reproduce the leak. 

The branch is hosted on GitHub at 
https://github.com/apache/kafka/compare/0.10.1.1...heroku:fix-throttled-response-leak

Before we open a PR, which base branch should we set as the target for the PR?

Thanks, 
Tim

> Kafka broker fails due to OOM when producer exceeds throttling quota for 
> extended periods of time
> -
>
> Key: KAFKA-4725
> URL: https://issues.apache.org/jira/browse/KAFKA-4725
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
> Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8
>Reporter: Jeff Chao
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.3.0, 0.10.2.1
>
> Attachments: oom-references.png
>
>
> Steps to Reproduce:
> 1. Create a non-compacted topic with 1 partition
> 2. Set a produce quota of 512 KB/s
> 3. Send messages at 20 MB/s
> 4. Observe heap memory growth as time progresses
> Investigation:
> While running performance tests with a user configured with a produce quota, 
> we found that the lead broker serving the requests would exhaust heap memory 
> if the producer sustained a inbound request throughput greater than the 
> produce quota. 
> Upon further investigation, we took a heap dump from that broker process and 
> discovered the ThrottledResponse object has a indirect reference to the 
> byte[] holding the messages associated with the ProduceRequest. 
> We're happy contributing a patch but in the meantime wanted to first raise 
> the issue and get feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-02-03 Thread Colin McCabe
On Thu, Feb 2, 2017, at 21:45, Becket Qin wrote:
> Hi Colin,
> 
> Thanks for the KIP. An admin client is probably a must after we block
> direct access to ZK. Some comments and thoughts below:
> 
> 1. Do we have a clear scope for the admin client? It might be worth
> thinking about the entire user experience of the admin client. Ideally we
> may want to have a single client to do all the administrative work
> instead
> of having multiple ways to do different things. For example, do we want
> to
> add topic configurations change API in the admin client? What about
> partition movements and preferred leader election? Those are also
> administrative tasks which seem reasonable to be integrated into the
> admin
> client.

Thanks for the comments, Becket!

I agree that topic configuration change should be in the administrative
client.  I have not thought about partition movement or preferred leader
election.  It probably makes sense to put them in the client as well,
but we should probably have a longer discussion about those features
later when someone is ready to implement them ;)

best,
Colin

> 
> 2. Regarding the Future based async Ops v.s. batching of Ops, I would
> prefer supporting batching if possible. That usually introduce much less
> overhead when doing some big operations, e.g. in controller we have been
> putting quite some efforts to batch the requests. For admin client, my
> understanding is that the operations are:
> a. rare and potentially big
> b. likely OK to block (it would be good to see some use cases where a
> nonblocking behavior is desired)
> c. the efficiency of the operation matters.
> Given the above three requirements, it seems a batching blocking API is
> fine?
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> 
> 
> On Thu, Feb 2, 2017 at 5:54 PM, Dong Lin  wrote:
> 
> > Hey Colin,
> >
> > Thanks for the KIP. I have a few comments below:
> >
> > - I share similar view with Ismael that a Future-based API is better.
> > PurgeDataFrom() is an example API that user may want to do it
> > asynchronously even though there is only one request in flight at a time.
> > In the future we may also have some admin operation that allows user to
> > move replica from one broker to another, which also needs to work in both
> > sync and async style. It seems more generic to return Future for any API
> > that requires both mode.
> >
> > - I am not sure if it is the cleanest way to have enum classes
> > CreateTopicsFlags and DeleteTopicsFlags. Are we going to create such class
> > for every future API that requires configuration? It may be more generic to
> > provide Map to any admin API that operates on multiple entries.
> > For example, deleteTopic(Map). And it can be Map > Properties> for those API that requires multiple configs per entry. And we
> > can provide default value, doc, config name for those API as we do
> > with AbstractConfig.
> >
> > - I not sure if "Try" is very intuitive to Java developer. Is there any
> > counterpart of scala's "Try" in java? We actually have quite a few existing
> > classes in Kafka that address the same problem, such as
> > ProduceRequestResult, LogAppendResult etc. Maybe we can follow the same
> > conversion and use *Result as this class name.
> >
> > - How are we going to allow user to specify timeout for blocking APIs such
> > as deleteTopic? Is this configured per AdminClient, or should it be
> > specified in the API parameter?
> >
> > - Are we going to have this class initiate its own thread, as we do with
> > Sender class, to send/receive requests? If yes, it will be useful to have
> > have a class that extends AbstractConfig and specifies config and their
> > default values.
> >
> > Thanks,
> > Dong
> >
> >
> >
> > On Thu, Feb 2, 2017 at 3:02 PM, Ismael Juma  wrote:
> >
> > > Hi Colin,
> > >
> > > Thanks for the KIP, great to make progress on this. I have some initial
> > > comments, will post more later:
> > >
> > > 1. We have KafkaProducer that implements the Producer interface and
> > > KafkaConsumer that implements the Consumer interface. Maybe we could have
> > > KafkaAdminClient that implements the AdminClient interface? Or maybe just
> > > KafkaAdmin. Not sure, some ideas for consideration. Also, I don't think
> > we
> > > should worry about a name clash with the internal AdminClient written in
> > > Scala. That will go away soon enough and choosing a good name for the
> > > public class is what matters.
> > >
> > > 2. We should include the proposed package name in the KIP
> > > (probably org.apache.kafka.clients.admin?).
> > >
> > > 3. It would be good to list the supported configs.
> > >
> > > 4. KIP-107, which passed the vote, specifies the introduction of a method
> > > to AdminClient with the following signature. We should figure out how it
> > > would look given this proposal.
> > >
> > > Future>
> > > 

Re: [DISCUSS] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-03 Thread Roger Hoover
This is great.  Thanks, Ismael.

On Fri, Feb 3, 2017 at 7:35 AM, Grant Henke  wrote:

> Looks good to me. Thanks for handling the KIP.
>
> On Fri, Feb 3, 2017 at 8:49 AM, Damian Guy  wrote:
>
> > Thanks Ismael. Makes sense to me.
> >
> > On Fri, 3 Feb 2017 at 10:39 Ismael Juma  wrote:
> >
> > > Hi all,
> > >
> > > I have posted a KIP for dropping support for Java 7 in Kafka 0.11:
> > >
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 118%3A+Drop+Support+for+Java+7+in+Kafka+0.11
> > >
> > > Most people were supportive when we last discussed the topic[1], but
> > there
> > > were a few concerns. I believe the following should mitigate the
> > concerns:
> > >
> > > 1. The new proposal suggests dropping support in the next major version
> > > instead of the next minor version.
> > > 2. KIP-97 which is part of 0.10.2 means that 0.11 clients will support
> > 0.10
> > > brokers (0.11 brokers will also support 0.10 clients as usual), so
> there
> > is
> > > even more flexibility on incremental upgrades.
> > > 3. Java 9 will be released shortly after the next Kafka release, so
> we'll
> > > be supporting the 2 most recent Java releases, which is a reasonable
> > > policy.
> > > 4. 8 months have passed since the last proposal and the release after
> > > 0.10.2 won't be out for another 4 months, which should hopefully be
> > enough
> > > time for Java 8 to be even more established. We haven't decided when
> the
> > > next major release will happen, but we know that it won't happen before
> > > June 2017.
> > >
> > > Please take a look at the proposal and share your feedback.
> > >
> > > Thanks,
> > > Ismael
> > >
> > > [1] http://search-hadoop.com/m/Kafka/uyzND1oIhV61GS5Sf2
> > >
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-02-03 Thread Colin McCabe
On Thu, Feb 2, 2017, at 17:54, Dong Lin wrote:
> Hey Colin,
> 
> Thanks for the KIP. I have a few comments below:
> 
> - I share similar view with Ismael that a Future-based API is better.
> PurgeDataFrom() is an example API that user may want to do it
> asynchronously even though there is only one request in flight at a time.
> In the future we may also have some admin operation that allows user to
> move replica from one broker to another, which also needs to work in both
> sync and async style. It seems more generic to return Future for any
> API
> that requires both mode.
> 
> - I am not sure if it is the cleanest way to have enum classes
> CreateTopicsFlags and DeleteTopicsFlags. Are we going to create such
> class
> for every future API that requires configuration? It may be more generic
> to
> provide Map to any admin API that operates on multiple
> entries.
> For example, deleteTopic(Map). And it can be Map Properties> for those API that requires multiple configs per entry. And
> we
> can provide default value, doc, config name for those API as we do
> with AbstractConfig.

Thanks for the comments, Dong L.

EnumSet, EnumSet, and so forth are
type-safe ways for the user to pass a set of boolean flags to the
function.  It is basically a replacement for having an api like
createTopics(, boolean nonblocking, boolean validateOnly).  It is
preferrable to having the boolean flags because it's immediately clear
when reading the code what createTopics(...,
CreateTopicsFlags.VALIDATE_ONLY) means, whereas it is not immediately
clear what createTopics(..., false, true) means.  It also prevents
having lots of function overloads over time, which becomes confusing for
users.  The EnumSet is not intended as a replacement for all possible
future arguments we might add, but just an easy way to add more boolean
arguments later without adding messy function overloads or type
signatures with many booleans.

Map is not type-safe, and I don't think we should use it in
our APIs.  Instead, we should just add function overloads if necessary.

> 
> - I not sure if "Try" is very intuitive to Java developer. Is there any
> counterpart of scala's "Try" in java

Unfortunately, there is no equivalent to Try in the standard Java
library.  That is the first place I checked, and I spent a while
searching.  The closest is probably Java 8's Optional.  However,
Optional just allows us to express Some(thing) or None, so callers would
not be able to determine what the error was.  

> We actually have quite a few
> existing
> classes in Kafka that address the same problem, such as
> ProduceRequestResult, LogAppendResult etc. Maybe we can follow the same
> conversion and use *Result as this class name.

Hmm.  ProduceRequestResult and LogAppendResult just store an exception
alongside the data, and make the caller responsible for checking whether
the exception is null (or None) before looking at the data.  I don't
think this is a good solution, because if the user forgets to do this,
they will interpret whatever is in the data fields as valid, even when
it is not.  ProduceRequestResult and LogAppendResult are also internal
classes, not user-facing APIs, so we did not spend time thinking about
how to make them easy to use for end-users.

> 
> - How are we going to allow user to specify timeout for blocking APIs
> such
> as deleteTopic? Is this configured per AdminClient, or should it be
> specified in the API parameter?

Right now, it is specified by the configuration of the AdminClient.

> 
> - Are we going to have this class initiate its own thread, as we do with
> Sender class, to send/receive requests? If yes, it will be useful to have
> have a class that extends AbstractConfig and specifies config and their
> default values.

Yes, I agree.  I will add this to the KIP.

best,
Colin

> 
> Thanks,
> Dong
> 
> 
> 
> On Thu, Feb 2, 2017 at 3:02 PM, Ismael Juma  wrote:
> 
> > Hi Colin,
> >
> > Thanks for the KIP, great to make progress on this. I have some initial
> > comments, will post more later:
> >
> > 1. We have KafkaProducer that implements the Producer interface and
> > KafkaConsumer that implements the Consumer interface. Maybe we could have
> > KafkaAdminClient that implements the AdminClient interface? Or maybe just
> > KafkaAdmin. Not sure, some ideas for consideration. Also, I don't think we
> > should worry about a name clash with the internal AdminClient written in
> > Scala. That will go away soon enough and choosing a good name for the
> > public class is what matters.
> >
> > 2. We should include the proposed package name in the KIP
> > (probably org.apache.kafka.clients.admin?).
> >
> > 3. It would be good to list the supported configs.
> >
> > 4. KIP-107, which passed the vote, specifies the introduction of a method
> > to AdminClient with the following signature. We should figure out how it
> > would look given this proposal.
> >
> > 

[GitHub] kafka pull request #2492: HOTFIX: update incorrect docs and broken links

2017-02-03 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/2492

HOTFIX: update incorrect docs and broken links



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka hotfixDocs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2492.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2492


commit 7e9f2bfef32fed153477ce4e026c7a073554c4b3
Author: Matthias J. Sax 
Date:   2017-02-03T18:16:45Z

HOTFIX: update incorrect docs and broken links




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] 0.10.2.0 RC0

2017-02-03 Thread Tom Crayford
-1 (non-binding). Whilst testing 0.10.1.1, we found that there's a bug with
quotas that means clients can cause broker OOMs very easily. If a producer
has exceeded the produce quota, the produce request memory is retained
inside the delay queue.  (https://issues.apache.org/jira/browse/KAFKA-4725).
Seeing as this bug critically impacts broker stability in the face of quota
enforcement, I'd like to see a fix landed (we're working on a patch
already, see
https://github.com/apache/kafka/compare/0.10.1.1...heroku:fix-throttled-response-leak,
but it's not quite ready yet).

This bug is still present in 0.10.2.0 RC0

To quote KIP-13 on this matter:

> Note that in case of producers, we should be careful about retaining
references to large Message bodies because we could easily exhaust broker
memory if parking hundreds of requests.

Tom Crayford


On Thu, Feb 2, 2017 at 8:01 PM, Mathieu Fenniak <
mathieu.fenn...@replicon.com> wrote:

> +1 (non-binding)
>
> Upgraded a KS app, custom KC connectors, and brokers, ran an end-to-end
> test suite.  Looks like a great release to me. :-)
>
> Mathieu
>
>
> On Wed, Feb 1, 2017 at 4:44 PM, Ewen Cheslack-Postava 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 0.10.2.0.
> >
> > This is a minor version release of Apache Kafka. It includes 19 new KIPs.
> > See the release notes and release plan (
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+0.10.2.0)
> > for more details. A few feature highlights: SASL-SCRAM support, improved
> > client compatibility to allow use of clients newer than the broker,
> session
> > windows and global tables in the Kafka Streams API, single message
> > transforms in the Kafka Connect framework.
> >
> > Release notes for the 0.10.2.0 release:
> > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Monday February 6th 5pm PST ***
> > (Note the longer window to vote to account for the normal 7 days ending
> in
> > the middle of the weekend.)
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > http://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc0/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/
> >
> > * Javadoc:
> > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc0/javadoc/
> >
> > * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> > 33ebac1f138f17b86002df05e55a9f5cff47f48a
> >
> >
> > * Documentation:
> > http://kafka.apache.org/0102/documentation.html
> >
> > * Protocol:
> > http://kafka.apache.org/0102/protocol.html
> >
> > * Successful Jenkins builds for the 0.10.2 branch:
> > Unit/integration tests: https://builds.apache.org/job/
> > kafka-0.10.2-jdk7/60/
> > System tests:
> > https://jenkins.confluent.io/job/system-test-kafka-branch-builder/697
> >
> > Thanks,
> > Ewen Cheslack-Postava
> >
>


[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex

2017-02-03 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851847#comment-15851847
 ] 

Uwe Schindler commented on KAFKA-4614:
--

Just to inform you. Cleaner is no longer available in Java 9. So you need 
another hack. In cooperation with Apache Lucene, Oracle added a workaround that 
works differently for Java 9 as a temporary workaround until unmapping ist 
supported in Java 10. I am traveling to FOSDEM at the moment where I will talk 
about this an meet Mark Reinhold to discuss about the Java 10 solution.

The Lucene issue with a nonreflective MethodHandle approach for unmapping in 
Java 7, 8 and 9 can be found  here: LUCENE-6989

> Long GC pause harming broker performance which is caused by mmap objects 
> created for OffsetIndex
> 
>
> Key: KAFKA-4614
> URL: https://issues.apache.org/jira/browse/KAFKA-4614
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>  Labels: latency, performance
> Fix For: 0.10.2.0
>
> Attachments: kafka-produce-99th.png
>
>
> First let me clarify our system environment information as I think it's 
> important to understand this issue:
> OS: CentOS6
> Kernel version: 2.6.32-XX
> Filesystem used for data volume: XFS
> Java version: 1.8.0_66
> GC option: Kafka default(G1GC)
> Kafka version: 0.10.0.1
> h2. Phenomenon
> In our Kafka cluster, an usual response time for Produce request is about 1ms 
> for 50th percentile to 10ms for 99th percentile. All topics are configured to 
> have 3 replicas and all producers are configured {{acks=all}} so this time 
> includes replication latency.
> Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms 
> but for the most cases the time consuming part is "Remote" which means it is 
> caused by slow replication which is known to happen by various reasons(which 
> is also an issue that we're trying to improve, but out of interest within 
> this issue).
> However, we found that there are some different patterns which happens rarely 
> but stationary 3 ~ 5 times a day for each servers. The interesting part is 
> that "RequestQueue" also got increased as well as "Total" and "Remote".
> At the same time, we observed that the disk read metrics(in terms of both 
> read bytes and read time) spikes exactly for the same moment. Currently we 
> have only caught up consumers so this metric sticks to zero while all 
> requests are served by page cache.
> In order to investigate what Kafka is "read"ing, I employed SystemTap and 
> wrote the following script. It traces all disk reads(only actual read by 
> physical device) made for the data volume by broker process.
> {code}
> global target_pid = KAFKA_PID
> global target_dev = DATA_VOLUME
> probe ioblock.request {
>   if (rw == BIO_READ && pid() == target_pid && devname == target_dev) {
>  t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment
>  printf("%s,%03d:  tid = %d, device = %s, inode = %d, size = %d\n", 
> ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size)
>  print_backtrace()
>  print_ubacktrace()
>   }
> }
> {code}
> As the result, we could observe many logs like below:
> {code}
> Thu Dec 22 17:21:39 2016,209:  tid = 126123, device = sdb1, inode = -1, size 
> = 4096
>  0x81275050 : generic_make_request+0x0/0x5a0 [kernel]
>  0x81275660 : submit_bio+0x70/0x120 [kernel]
>  0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs]
>  0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs]
>  0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs]
>  0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs]
>  0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs]
>  0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs]
>  0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs]
>  0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs]
>  0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs]
>  0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs]
>  0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs]
>  0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs]
>  0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs]
>  0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs]
>  0x811affbc : clear_inode+0xac/0x140 [kernel]
>  0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel]
>  0x811b0815 : generic_drop_inode+0x65/0x80 [kernel]
>  0x811af662 : iput+0x62/0x70 [kernel]
>  0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so]
>  0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 
> [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so]
>  0x7ff269a1307e
> {code}
> We took a 

[jira] [Created] (KAFKA-4731) Add event-based session windows

2017-02-03 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-4731:
--

 Summary: Add event-based session windows
 Key: KAFKA-4731
 URL: https://issues.apache.org/jira/browse/KAFKA-4731
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Reporter: Matthias J. Sax
Priority: Minor


Kafka Streams allows to define session windows based on an time-based 
inactivity gap. This can be used for _session detection_.

However, some data streams do contain event like "start session" and "end 
session" to mark the begin and end of a session. For this use case, it is not 
required to _detect_ session because session boundaries are known and not based 
on time -- but on events.

This Jira is about adding support for those event-based session windows. To add 
this feature, a KIP 
(https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals) 
is required to discuss the proposed design.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-111 : Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-03 Thread Mayuresh Gharat
Hi All,

If there are no more concerns, I will like to start vote for this KIP.

Thanks,

Mayuresh

On Wed, Feb 1, 2017 at 8:38 PM, Mayuresh Gharat 
wrote:

> Hi Dong,
>
> What I meant was "Right now Kafka just extracts the name out of the
> Principal that is generated by the PrincipalBuilder. Instead of doing that
> if it preserves the Principal itself, this issue can be addressed".
>
> May be I should have used the word "preserve" instead of "stores". I have
> updated the wording in the KIP.
>
> Thanks,
>
> Mayuresh
>
> On Wed, Feb 1, 2017 at 8:30 PM, Dong Lin  wrote:
>
>> The last paragraph of the motivation section is a bit confusing. I guess
>> you want to say "This issue can be addressed if the Session class stores
>> the Principal object extracted from a request".
>>
>> I like the approach of changing Session class to be case class
>> *Session(principal:
>> KafkaPrincipal, clientAddress: InetAddress)* under the assumption that the
>> Session class doesn't really need principalType of the KafkaPrincipal. I
>> am
>> wondering if anyone in the open source mailing list knows why we need to
>> have principalType in KafkaPrincipal.
>>
>> For the record, I actually prefer that we use the existing configure() to
>> provide properties to PrincipalBuilder instead of adding the method
>> *buildPrincipal(Map> ?> principalConfigs)* in the PrincipalBuilder interface. But this is not a
>> blocking issue for me.
>>
>>
>>
>>
>> On Wed, Feb 1, 2017 at 2:54 PM, Mayuresh Gharat <
>> gharatmayures...@gmail.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > I have updated the KIP as per our discussion here.
>> > It would be great if you can take another look and let me know if there
>> are
>> > any concerns.
>> >
>> > Thanks,
>> >
>> > Mayuresh
>> >
>> > On Sat, Jan 28, 2017 at 6:10 PM, Mayuresh Gharat <
>> > gharatmayures...@gmail.com
>> > > wrote:
>> >
>> > > I had offline discussions with Joel, Dong and Radai.
>> > >
>> > > I agree that we can replace the KafkaPrincipal in Session with the
>> > > ChannelPrincipal.
>> > > KafkaPrincipal can be provided as an out of box implementation.
>> > >
>> > > The only gotcha will be users will have to implement there own
>> > Authorizer,
>> > > if they decide to use there own PrincipalBuilder in kafka-acls.sh.
>> > >
>> > > I will update the KIP accordingly.
>> > >
>> > > Thanks,
>> > >
>> > > Mayuresh
>> > >
>> > > On Thu, Jan 26, 2017 at 6:01 PM, Mayuresh Gharat <
>> > > gharatmayures...@gmail.com> wrote:
>> > >
>> > >> Hi Dong,
>> > >>
>> > >> Thanks for the review. Please see the replies inline.
>> > >>
>> > >>
>> > >> 1. I am not sure we need to add the method
>> buildPrincipal(Map
>> > >> principalConfigs). It seems that user can simply do
>> > >> principalBuilder.configure(...).buildPrincipal(...) without using
>> that
>> > >> method.
>> > >> -> I am not sure if I understand the question.
>> > >> buildPrincipal(Map principalConfigs) will be used to build
>> > >> individual Principals from the passed in configs. Each Principal can
>> be
>> > >> different type and the PrincipalBuilder is responsible for handling
>> > those
>> > >> configs correctly and build those Principals.
>> > >>
>> > >> 2. Is there any reason specific reason that we should put the
>> > >> channelPrincipal in KafkaPrincipal class instead of the Session
>> class?
>> > If
>> > >> they work equally well to serve the use-case of this KIP, then it
>> seems
>> > >> better to put this field in the Session class to avoid changing
>> > interface
>> > >> that needs to be implemented by custom principal.
>> > >> -> Doing this might be backwards incompatible as we need to
>> > >> preserve the existing behavior of kafka-acls.sh. Also as we have
>> field
>> > of
>> > >> PrincipalType which can be used in future if Kafka decides to support
>> > >> different Principal types (currently it just says "User"), we might
>> > loose
>> > >> that functionality.
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Mayuresh
>> > >>
>> > >>
>> > >> On Tue, Jan 24, 2017 at 3:35 PM, Dong Lin 
>> wrote:
>> > >>
>> > >>> Hey Mayuresh,
>> > >>>
>> > >>> Thanks for the KIP. I actually like the suggestions by Ismael and
>> Jun.
>> > >>> Here
>> > >>> are my comments:
>> > >>>
>> > >>> 1. I am not sure we need to add the method
>> buildPrincipal(Map> > ?>
>> > >>> principalConfigs). It seems that user can simply do
>> > >>> principalBuilder.configure(...).buildPrincipal(...) without using
>> that
>> > >>> method.
>> > >>>
>> > >>> 2. Is there any reason specific reason that we should put the
>> > >>> channelPrincipal in KafkaPrincipal class instead of the Session
>> class?
>> > If
>> > >>> they work equally well to serve the use-case of this KIP, then it
>> seems
>> > >>> better to put this field in the Session class to avoid changing
>> > interface
>> > >>> that needs to be implemented by custom principal.
>> > >>>
>> > >>> Dong
>> > 

[jira] [Commented] (KAFKA-4646) Improve test coverage AbstractProcessorContext

2017-02-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851758#comment-15851758
 ] 

ASF GitHub Bot commented on KAFKA-4646:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2447


> Improve test coverage AbstractProcessorContext
> --
>
> Key: KAFKA-4646
> URL: https://issues.apache.org/jira/browse/KAFKA-4646
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4646) Improve test coverage AbstractProcessorContext

2017-02-03 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4646:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2447
[https://github.com/apache/kafka/pull/2447]

> Improve test coverage AbstractProcessorContext
> --
>
> Key: KAFKA-4646
> URL: https://issues.apache.org/jira/browse/KAFKA-4646
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2447: KAFKA-4646: Improve test coverage AbstractProcesso...

2017-02-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2447


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2452: KAFKA-4649: Improve test coverage GlobalStateManag...

2017-02-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2452


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4649) Improve test coverage GlobalStateManagerImpl

2017-02-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851745#comment-15851745
 ] 

ASF GitHub Bot commented on KAFKA-4649:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2452


> Improve test coverage GlobalStateManagerImpl
> 
>
> Key: KAFKA-4649
> URL: https://issues.apache.org/jira/browse/KAFKA-4649
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
>Assignee: Damian Guy
>Priority: Minor
> Fix For: 0.10.3.0
>
>
> Exception paths in {{initialize}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4649) Improve test coverage GlobalStateManagerImpl

2017-02-03 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4649:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2452
[https://github.com/apache/kafka/pull/2452]

> Improve test coverage GlobalStateManagerImpl
> 
>
> Key: KAFKA-4649
> URL: https://issues.apache.org/jira/browse/KAFKA-4649
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
>Assignee: Damian Guy
>Priority: Minor
> Fix For: 0.10.3.0
>
>
> Exception paths in {{initialize}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time

2017-02-03 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851730#comment-15851730
 ] 

Ismael Juma commented on KAFKA-4725:


Nice catch, a contribution via a PR would be welcome indeed.

> Kafka broker fails due to OOM when producer exceeds throttling quota for 
> extended periods of time
> -
>
> Key: KAFKA-4725
> URL: https://issues.apache.org/jira/browse/KAFKA-4725
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
> Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8
>Reporter: Jeff Chao
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.3.0, 0.10.2.1
>
> Attachments: oom-references.png
>
>
> Steps to Reproduce:
> 1. Create a non-compacted topic with 1 partition
> 2. Set a produce quota of 512 KB/s
> 3. Send messages at 20 MB/s
> 4. Observe heap memory growth as time progresses
> Investigation:
> While running performance tests with a user configured with a produce quota, 
> we found that the lead broker serving the requests would exhaust heap memory 
> if the producer sustained a inbound request throughput greater than the 
> produce quota. 
> Upon further investigation, we took a heap dump from that broker process and 
> discovered the ThrottledResponse object has a indirect reference to the 
> byte[] holding the messages associated with the ProduceRequest. 
> We're happy contributing a patch but in the meantime wanted to first raise 
> the issue and get feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time

2017-02-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4725:
---
Fix Version/s: 0.10.3.0

> Kafka broker fails due to OOM when producer exceeds throttling quota for 
> extended periods of time
> -
>
> Key: KAFKA-4725
> URL: https://issues.apache.org/jira/browse/KAFKA-4725
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
> Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8
>Reporter: Jeff Chao
>  Labels: reliability
> Fix For: 0.10.3.0, 0.10.2.1
>
> Attachments: oom-references.png
>
>
> Steps to Reproduce:
> 1. Create a non-compacted topic with 1 partition
> 2. Set a produce quota of 512 KB/s
> 3. Send messages at 20 MB/s
> 4. Observe heap memory growth as time progresses
> Investigation:
> While running performance tests with a user configured with a produce quota, 
> we found that the lead broker serving the requests would exhaust heap memory 
> if the producer sustained a inbound request throughput greater than the 
> produce quota. 
> Upon further investigation, we took a heap dump from that broker process and 
> discovered the ThrottledResponse object has a indirect reference to the 
> byte[] holding the messages associated with the ProduceRequest. 
> We're happy contributing a patch but in the meantime wanted to first raise 
> the issue and get feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time

2017-02-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4725:
---
Priority: Critical  (was: Major)

> Kafka broker fails due to OOM when producer exceeds throttling quota for 
> extended periods of time
> -
>
> Key: KAFKA-4725
> URL: https://issues.apache.org/jira/browse/KAFKA-4725
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
> Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8
>Reporter: Jeff Chao
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.3.0, 0.10.2.1
>
> Attachments: oom-references.png
>
>
> Steps to Reproduce:
> 1. Create a non-compacted topic with 1 partition
> 2. Set a produce quota of 512 KB/s
> 3. Send messages at 20 MB/s
> 4. Observe heap memory growth as time progresses
> Investigation:
> While running performance tests with a user configured with a produce quota, 
> we found that the lead broker serving the requests would exhaust heap memory 
> if the producer sustained a inbound request throughput greater than the 
> produce quota. 
> Upon further investigation, we took a heap dump from that broker process and 
> discovered the ThrottledResponse object has a indirect reference to the 
> byte[] holding the messages associated with the ProduceRequest. 
> We're happy contributing a patch but in the meantime wanted to first raise 
> the issue and get feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time

2017-02-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4725:
---
Fix Version/s: 0.10.2.1

> Kafka broker fails due to OOM when producer exceeds throttling quota for 
> extended periods of time
> -
>
> Key: KAFKA-4725
> URL: https://issues.apache.org/jira/browse/KAFKA-4725
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
> Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8
>Reporter: Jeff Chao
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.3.0, 0.10.2.1
>
> Attachments: oom-references.png
>
>
> Steps to Reproduce:
> 1. Create a non-compacted topic with 1 partition
> 2. Set a produce quota of 512 KB/s
> 3. Send messages at 20 MB/s
> 4. Observe heap memory growth as time progresses
> Investigation:
> While running performance tests with a user configured with a produce quota, 
> we found that the lead broker serving the requests would exhaust heap memory 
> if the producer sustained a inbound request throughput greater than the 
> produce quota. 
> Upon further investigation, we took a heap dump from that broker process and 
> discovered the ThrottledResponse object has a indirect reference to the 
> byte[] holding the messages associated with the ProduceRequest. 
> We're happy contributing a patch but in the meantime wanted to first raise 
> the issue and get feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time

2017-02-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4725:
---
Labels: reliability  (was: )

> Kafka broker fails due to OOM when producer exceeds throttling quota for 
> extended periods of time
> -
>
> Key: KAFKA-4725
> URL: https://issues.apache.org/jira/browse/KAFKA-4725
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
> Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8
>Reporter: Jeff Chao
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.3.0, 0.10.2.1
>
> Attachments: oom-references.png
>
>
> Steps to Reproduce:
> 1. Create a non-compacted topic with 1 partition
> 2. Set a produce quota of 512 KB/s
> 3. Send messages at 20 MB/s
> 4. Observe heap memory growth as time progresses
> Investigation:
> While running performance tests with a user configured with a produce quota, 
> we found that the lead broker serving the requests would exhaust heap memory 
> if the producer sustained a inbound request throughput greater than the 
> produce quota. 
> Upon further investigation, we took a heap dump from that broker process and 
> discovered the ThrottledResponse object has a indirect reference to the 
> byte[] holding the messages associated with the ProduceRequest. 
> We're happy contributing a patch but in the meantime wanted to first raise 
> the issue and get feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka-site pull request #43: [DO NOT MERGE] manual changes after copy-paste ...

2017-02-03 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka-site/pull/43

[DO NOT MERGE] manual changes after copy-paste to 0.10.2

@derrickdoo 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka-site KMinor-0.10.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka-site/pull/43.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #43






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2464: KAFKA-4662: adding test coverage for addSource met...

2017-02-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2464


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4662) Improve test coverage TopologyBuilder

2017-02-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851710#comment-15851710
 ] 

ASF GitHub Bot commented on KAFKA-4662:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2464


> Improve test coverage TopologyBuilder
> -
>
> Key: KAFKA-4662
> URL: https://issues.apache.org/jira/browse/KAFKA-4662
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
>Assignee: Bill Bejeck
>Priority: Minor
> Fix For: 0.10.3.0
>
>
> overloaded {{addSource}} methods with {{AutoOffsetReset}} param not tested.
> Also some exception branches not covered



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4662) Improve test coverage TopologyBuilder

2017-02-03 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4662:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2464
[https://github.com/apache/kafka/pull/2464]

> Improve test coverage TopologyBuilder
> -
>
> Key: KAFKA-4662
> URL: https://issues.apache.org/jira/browse/KAFKA-4662
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
>Assignee: Bill Bejeck
>Priority: Minor
> Fix For: 0.10.3.0
>
>
> overloaded {{addSource}} methods with {{AutoOffsetReset}} param not tested.
> Also some exception branches not covered



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-02-03 Thread Ismael Juma
Hi Grant,

That's an interesting point. It would be good to hear what others' think of
making that the official policy instead of starting a discussion/vote each
time. If there is consensus, I am happy to revise the KIPs. Otherwise, we
keep them as they are and discuss/vote on this instance only.

Ismael

On Fri, Feb 3, 2017 at 3:41 PM, Grant Henke  wrote:

> Thanks for proposing this Ismael. This makes sense to me.
>
> In this KIP and the java KIP you state:
>
> A reasonable policy is to support the 2 most recently released versions so
> > that we can strike a good balance between supporting older versions,
> > maintainability and taking advantage of language and library
> improvements.
>
>
> What do you think about adjusting the KIP to instead vote on that as a
> standard policy for Java and Scala going forward? Something along the lines
> of:
>
> "Kafka's policy is to support the 2 most recently released versions of Java
> and Scala at a given time. When a new version becomes available, the
> supported versions will be updated in the next major release of Kafka."
>
>
> On Fri, Feb 3, 2017 at 8:30 AM, Ismael Juma  wrote:
>
> > Hi all,
> >
> > I have posted a KIP for dropping support for Scala 2.10 in Kafka 0.11:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11
> >
> > Please take a look. Your feedback is appreciated.
> >
> > Thanks,
> > Ismael
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


Re: Trying to understand design decision about producer ack and min.insync.replicas

2017-02-03 Thread Grant Henke
I would be in favor of defaulting acks=all.

I have found that most people want to start with the stronger/safer
guarantees and then adjust them for performance on a case by case basis.
This gives them a chance to understand and accept the tradeoffs.

A few other defaults I would be in favor of changing (some are harder and
more controversial than others) are:

Broker:

   - zookeeper.chroot=kafka (was "")
   - This will be easiest when direct communication to zookeeper isn't done
  by clients

Producer:

   - block.on.buffer.full=true (was false)
   - max.in.flight.requests.per.connection=1 (was 5)

All:

   - *receive.buffer.bytes=-1 (was 102400)
   - *send.buffer.bytes=-1 (was 102400)




On Fri, Feb 3, 2017 at 2:03 AM, Ismael Juma  wrote:

> I'd be in favour too.
>
> Ismael
>
> On 3 Feb 2017 7:33 am, "Ewen Cheslack-Postava"  wrote:
>
> > On Thu, Feb 2, 2017 at 11:21 PM, James Cheng 
> wrote:
> >
> > > Ewen,
> > >
> > > Ah right, that's a good point.
> > >
> > > My initial reaction to your examples was that "well, those should be in
> > > separate topics", but then I realized that people choose their topics
> > for a
> > > variety of reasons. Sometimes they organize it based on their
> producers,
> > > sometimes they organize it based on the nature of the data, but
> sometimes
> > > (as you gave examples about), they may organize it based on the
> consuming
> > > application. And there are valid reason to want different data types
> in a
> > > single topic:
> > >
> > > 1) You get global ordering
> > > 2) You get persistent ordering in the case of re-reads (where as
> reading
> > 2
> > > topics would cause different ordering upon re-reads.)
> > > 3) Logically-related data types all co-located.
> > >
> > > I do still think it'd be convenient to only have to set
> > > min.insync.replicas on a topic and not have to require producing
> > > applications to also set acks=all. It'd then be a single thing you have
> > to
> > > configure, instead of the current 2 things. (since, as currently
> > > implemented, you have to set both things, in order to achieve high
> > > durability.)
> > >
> >
> > I entirely agree, I think the default should be acks=all and then this
> > would be true :) Similar to the unclean leader election setting, I think
> > defaulting to durable by default is a better choice. I understand
> > historically why a different choice was made (Kafka didn't start out as a
> > replicated, durable storage system), but given how it has evolved I think
> > durable by default would be a better choice on both the broker and
> > producer.
> >
> >
> > >
> > > But I admit that it's hard to find the balance of features/simplicity/
> > complexity,
> > > to handle all the use cases.
> > >
> >
> > Perhaps the KIP-106 adjustment to unclean leader election could benefit
> > from a sister KIP for adjusting the default producer acks setting?
> >
> > Not sure how popular it would be, but I would be in favor.
> >
> > -Ewen
> >
> >
> > >
> > > Thanks,
> > > -James
> > >
> > > > On Feb 2, 2017, at 9:42 PM, Ewen Cheslack-Postava  >
> > > wrote:
> > > >
> > > > James,
> > > >
> > > > Great question, I probably should have been clearer. log data is an
> > > example
> > > > where the app (or even instance of the app) might know best what the
> > > right
> > > > tradeoff is. Depending on your strategy for managing logs, you may or
> > may
> > > > not be mixing multiple logs (and logs from different deployments)
> into
> > > the
> > > > same topic. For example, if you key by application, then you have an
> > easy
> > > > way to split logs up while still getting a global feed of log
> messages.
> > > > Maybe logs from one app are really critical and we want to retry, but
> > > from
> > > > another app are just a nice to have.
> > > >
> > > > There are other examples even within a single app. For example, a
> > gaming
> > > > company might report data from a user of a game to the same topic but
> > > want
> > > > 2 producers with different reliability levels (and possibly where the
> > > > ordering constraints across the two sets that might otherwise cause
> you
> > > to
> > > > use a single consumer are not an issue). High frequency telemetry on
> a
> > > > player might be desirable to have, but not the end of the world if
> some
> > > is
> > > > lost. In contrast, they may want a stronger guarantee for, e.g.,
> > sometime
> > > > like chat messages, where they want to have a permanent record of
> them
> > in
> > > > all circumstances.
> > > >
> > > > -Ewen
> > > >
> > > > On Fri, Jan 27, 2017 at 12:59 AM, James Cheng 
> > > wrote:
> > > >
> > > >>
> > > >>> On Jan 27, 2017, at 12:18 AM, Ewen Cheslack-Postava <
> > e...@confluent.io
> > > >
> > > >> wrote:
> > > >>>
> > > >>> On Thu, Jan 26, 2017 at 4:23 PM, Luciano Afranllie <
> > > >> listas.luaf...@gmail.com
> > >  wrote:
> > > >>>
> > >  I was thinking about 

Re: KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-02-03 Thread Grant Henke
Thanks for proposing this Ismael. This makes sense to me.

In this KIP and the java KIP you state:

A reasonable policy is to support the 2 most recently released versions so
> that we can strike a good balance between supporting older versions,
> maintainability and taking advantage of language and library improvements.


What do you think about adjusting the KIP to instead vote on that as a
standard policy for Java and Scala going forward? Something along the lines
of:

"Kafka's policy is to support the 2 most recently released versions of Java
and Scala at a given time. When a new version becomes available, the
supported versions will be updated in the next major release of Kafka."


On Fri, Feb 3, 2017 at 8:30 AM, Ismael Juma  wrote:

> Hi all,
>
> I have posted a KIP for dropping support for Scala 2.10 in Kafka 0.11:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11
>
> Please take a look. Your feedback is appreciated.
>
> Thanks,
> Ismael
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-03 Thread Grant Henke
Looks good to me. Thanks for handling the KIP.

On Fri, Feb 3, 2017 at 8:49 AM, Damian Guy  wrote:

> Thanks Ismael. Makes sense to me.
>
> On Fri, 3 Feb 2017 at 10:39 Ismael Juma  wrote:
>
> > Hi all,
> >
> > I have posted a KIP for dropping support for Java 7 in Kafka 0.11:
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 118%3A+Drop+Support+for+Java+7+in+Kafka+0.11
> >
> > Most people were supportive when we last discussed the topic[1], but
> there
> > were a few concerns. I believe the following should mitigate the
> concerns:
> >
> > 1. The new proposal suggests dropping support in the next major version
> > instead of the next minor version.
> > 2. KIP-97 which is part of 0.10.2 means that 0.11 clients will support
> 0.10
> > brokers (0.11 brokers will also support 0.10 clients as usual), so there
> is
> > even more flexibility on incremental upgrades.
> > 3. Java 9 will be released shortly after the next Kafka release, so we'll
> > be supporting the 2 most recent Java releases, which is a reasonable
> > policy.
> > 4. 8 months have passed since the last proposal and the release after
> > 0.10.2 won't be out for another 4 months, which should hopefully be
> enough
> > time for Java 8 to be even more established. We haven't decided when the
> > next major release will happen, but we know that it won't happen before
> > June 2017.
> >
> > Please take a look at the proposal and share your feedback.
> >
> > Thanks,
> > Ismael
> >
> > [1] http://search-hadoop.com/m/Kafka/uyzND1oIhV61GS5Sf2
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-03 Thread Ismael Juma
Comments inline.

On Thu, Feb 2, 2017 at 6:28 PM, Jason Gustafson  wrote:

> Took me a while to remember why we didn't do this. The timestamp that is
> included at the message set level is the max timestamp of all messages in
> the message set as is the case in the current message format (I will update
> the document to make this explicit). We could make the message timestamps
> relative to the max timestamp, but that makes serialization a bit awkward
> since the timestamps are not assumed to be increasing sequentially or
> monotonically. Once the messages in the message set had been determined, we
> would need to go back and adjust the relative timestamps.
>

Yes, I thought this would be a bit tricky and hence why I mentioned the
option of adding a new field at the message set level for the first
timestamp even though that's not ideal either.

Here's one idea. We let the timestamps in the messages be varints, but we
> make their values be relative to the timestamp of the previous message,
> with the timestamp of the first message being absolute. For example, if we
> had timestamps 500, 501, 499, then we would write 500 for the first
> message, 1 for the next, and -2 for the final message. Would that work? Let
> me think a bit about it and see if there are any problems.
>

It's an interesting idea. Comparing to the option of having the first
timestamp in the message set, It's a little more space efficient as we
don't have both a full timestamp in the message set _and_ a varint in the
first message (which would always be 0, so we avoid the extra byte) and
also the deltas could be a little smaller in the common case. The main
downside is that it introduces a semantics inconsistency between the first
message and the rest. Not ideal, but maybe we can live with that.

Ismael


Re: [DISCUSS] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-03 Thread Damian Guy
Thanks Ismael. Makes sense to me.

On Fri, 3 Feb 2017 at 10:39 Ismael Juma  wrote:

> Hi all,
>
> I have posted a KIP for dropping support for Java 7 in Kafka 0.11:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-118%3A+Drop+Support+for+Java+7+in+Kafka+0.11
>
> Most people were supportive when we last discussed the topic[1], but there
> were a few concerns. I believe the following should mitigate the concerns:
>
> 1. The new proposal suggests dropping support in the next major version
> instead of the next minor version.
> 2. KIP-97 which is part of 0.10.2 means that 0.11 clients will support 0.10
> brokers (0.11 brokers will also support 0.10 clients as usual), so there is
> even more flexibility on incremental upgrades.
> 3. Java 9 will be released shortly after the next Kafka release, so we'll
> be supporting the 2 most recent Java releases, which is a reasonable
> policy.
> 4. 8 months have passed since the last proposal and the release after
> 0.10.2 won't be out for another 4 months, which should hopefully be enough
> time for Java 8 to be even more established. We haven't decided when the
> next major release will happen, but we know that it won't happen before
> June 2017.
>
> Please take a look at the proposal and share your feedback.
>
> Thanks,
> Ismael
>
> [1] http://search-hadoop.com/m/Kafka/uyzND1oIhV61GS5Sf2
>


KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-02-03 Thread Ismael Juma
Hi all,

I have posted a KIP for dropping support for Scala 2.10 in Kafka 0.11:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11

Please take a look. Your feedback is appreciated.

Thanks,
Ismael


[jira] [Updated] (KAFKA-4422) Drop support for Scala 2.10

2017-02-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4422:
---
Labels: kip  (was: )

> Drop support for Scala 2.10
> ---
>
> Key: KAFKA-4422
> URL: https://issues.apache.org/jira/browse/KAFKA-4422
> Project: Kafka
>  Issue Type: Task
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>  Labels: kip
> Fix For: 0.11.0.0
>
>
> Now that Scala 2.12 has been released, we should drop support for Scala 2.10  
> in the next major Kafka version so that we keep the number of supported 
> versions at 2. Since we have to compile and run the tests on each supported 
> version, there is a non-trivial cost from a development and testing 
> perspective.
> The clients library is in Java and we recommend people use the Java clients 
> instead of the Scala ones, so dropping support for Scala 2.10 should have a 
> smaller impact than it would have had in the past. Scala 2.10 was released in 
> January 2013 and support ended in March 2015. 
> Once we drop support for Scala 2.10, we can take advantage of APIs and 
> compiler improvements introduced in Scala 2.11 (introduced in April 2014): 
> http://scala-lang.org/news/2.11.0
> Before we do this, we should start a discussion thread followed by a vote in 
> the mailing list.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4422) Drop support for Scala 2.10

2017-02-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4422:
---
Description: 
Now that Scala 2.12 has been released, we should drop support for Scala 2.10  
in the next major Kafka version so that we keep the number of supported 
versions at 2. Since we have to compile and run the tests on each supported 
version, there is a non-trivial cost from a development and testing perspective.

The clients library is in Java and we recommend people use the Java clients 
instead of the Scala ones, so dropping support for Scala 2.10 should have a 
smaller impact than it would have had in the past. Scala 2.10 was released in 
January 2013 and support ended in March 2015. 

Once we drop support for Scala 2.10, we can take advantage of APIs and compiler 
improvements introduced in Scala 2.11 (introduced in April 2014): 
http://scala-lang.org/news/2.11.0

Link to KIP:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11

  was:
Now that Scala 2.12 has been released, we should drop support for Scala 2.10  
in the next major Kafka version so that we keep the number of supported 
versions at 2. Since we have to compile and run the tests on each supported 
version, there is a non-trivial cost from a development and testing perspective.

The clients library is in Java and we recommend people use the Java clients 
instead of the Scala ones, so dropping support for Scala 2.10 should have a 
smaller impact than it would have had in the past. Scala 2.10 was released in 
January 2013 and support ended in March 2015. 

Once we drop support for Scala 2.10, we can take advantage of APIs and compiler 
improvements introduced in Scala 2.11 (introduced in April 2014): 
http://scala-lang.org/news/2.11.0

Before we do this, we should start a discussion thread followed by a vote in 
the mailing list.


> Drop support for Scala 2.10
> ---
>
> Key: KAFKA-4422
> URL: https://issues.apache.org/jira/browse/KAFKA-4422
> Project: Kafka
>  Issue Type: Task
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>  Labels: kip
> Fix For: 0.11.0.0
>
>
> Now that Scala 2.12 has been released, we should drop support for Scala 2.10  
> in the next major Kafka version so that we keep the number of supported 
> versions at 2. Since we have to compile and run the tests on each supported 
> version, there is a non-trivial cost from a development and testing 
> perspective.
> The clients library is in Java and we recommend people use the Java clients 
> instead of the Scala ones, so dropping support for Scala 2.10 should have a 
> smaller impact than it would have had in the past. Scala 2.10 was released in 
> January 2013 and support ended in March 2015. 
> Once we drop support for Scala 2.10, we can take advantage of APIs and 
> compiler improvements introduced in Scala 2.11 (introduced in April 2014): 
> http://scala-lang.org/news/2.11.0
> Link to KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-03 Thread Rajini Sivaram
I have a few questions on security (sorry, only just catching up on the
updates).

1. Will the transaction coordinator check topic ACLs based on the
requesting client's credentials? Access to transaction logs, topics being
added for transaction etc?
2. If I create a transactional produce request (by hand, not using the
producer API) with a random PID (random, hence unlikely to be in use), will
the broker append a transactional message to the logs, preventing LSO from
moving forward? What validation will broker do for PIDs?
3. Will every broker check that a client sending transactional produce
requests at least has write access to transaction log topic since it is not
validating transactional.id (for every produce request)?
4. I understand that brokers cannot authorize the transactional id for each
produce request since requests contain only the PID. But since there is a
one-to-one mapping between PID and transactional.id, and a connection is
never expected to change its transactional.id, perhaps it is feasible to
add authorization and cache the results in the Session? Perhaps not for
version 1, but feels like it will be good to close the security gap here.
Obviously it would be simpler if transactional.id was in the produce
request if the overhead was acceptable.

Thank you,

Rajini


On Thu, Feb 2, 2017 at 8:37 PM, Ismael Juma  wrote:

> Yes, I'd also prefer the option where we only have a checksum at the
> message set level. I didn't suggest it due to the mentioned auditing use
> cases, but if they can be satisfied in some other way, then that would be
> great.
>
> Ismael
>
> On 2 Feb 2017 7:03 pm, "Jason Gustafson"  wrote:
>
> One more:
>
> 1. I did some benchmarking and CRC32C seems to be a massive win when using
> > the hardware instruction (particularly for messages larger than 65k), so
> > I'm keen on taking advantage of the message format version bump to add
> > support for it. I can write a separate KIP for this as it's not tied to
> > Exactly-once, but it would be good to include the code change in the same
> > PR that bumps the message format version. The benchmark and results can
> be
> > found in the following link:
> > https://gist.github.com/ijuma/e58ad79d489cb831b290e83b46a7d7bb.
>
>
> Yeah, makes sense. We can add this to this KIP or do it separately,
> whichever you prefer. I have also been very interested in removing the
> individual message CRCs. The main reason we haven't done so is because some
> auditing applications depend on them, but there are cases where it's
> already unsafe to depend on the message CRCs not changing on the broker
> (message conversion and the use of log append time can both result in new
> message-level crcs). So I'm wondering a bit about the use cases that
> require the message CRCs and how they handle this. Perhaps if it is not
> dependable anyway, we can remove it and safe some space and computation.
>
> -Jason
>
>
> On Thu, Feb 2, 2017 at 10:28 AM, Jason Gustafson 
> wrote:
>
> > Hey Ismael,
> >
> > 2. The message timestamp field is 8 bytes. Did we consider storing the
> >> first timestamp in the message set and then storing deltas using varints
> >> in
> >> the messages like we do for offsets (the difference would be the usage
> of
> >> signed varints)? It seems like the deltas would be quite a bit smaller
> in
> >> the common case (potentially 0 for log append time, so we could even not
> >> store them at all using attributes like we do for key/value lengths). An
> >> alternative is using MaxTimestamp that is already present in the message
> >> set and computing deltas from that, but that seems more complicated. In
> >> any
> >> case, details aside, was this idea considered and rejected or is it
> worth
> >> exploring further?
> >
> >
> > Took me a while to remember why we didn't do this. The timestamp that is
> > included at the message set level is the max timestamp of all messages in
> > the message set as is the case in the current message format (I will
> update
> > the document to make this explicit). We could make the message timestamps
> > relative to the max timestamp, but that makes serialization a bit awkward
> > since the timestamps are not assumed to be increasing sequentially or
> > monotonically. Once the messages in the message set had been determined,
> we
> > would need to go back and adjust the relative timestamps.
> >
> > Here's one idea. We let the timestamps in the messages be varints, but we
> > make their values be relative to the timestamp of the previous message,
> > with the timestamp of the first message being absolute. For example, if
> we
> > had timestamps 500, 501, 499, then we would write 500 for the first
> > message, 1 for the next, and -2 for the final message. Would that work?
> Let
> > me think a bit about it and see if there are any problems.
> >
> > -Jason
> >
> > On Thu, Feb 2, 2017 at 9:04 AM, Apurva Mehta 
> wrote:
> >
> >> Good 

[DISCUSS] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-03 Thread Ismael Juma
Hi all,

I have posted a KIP for dropping support for Java 7 in Kafka 0.11:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-118%3A+Drop+Support+for+Java+7+in+Kafka+0.11

Most people were supportive when we last discussed the topic[1], but there
were a few concerns. I believe the following should mitigate the concerns:

1. The new proposal suggests dropping support in the next major version
instead of the next minor version.
2. KIP-97 which is part of 0.10.2 means that 0.11 clients will support 0.10
brokers (0.11 brokers will also support 0.10 clients as usual), so there is
even more flexibility on incremental upgrades.
3. Java 9 will be released shortly after the next Kafka release, so we'll
be supporting the 2 most recent Java releases, which is a reasonable policy.
4. 8 months have passed since the last proposal and the release after
0.10.2 won't be out for another 4 months, which should hopefully be enough
time for Java 8 to be even more established. We haven't decided when the
next major release will happen, but we know that it won't happen before
June 2017.

Please take a look at the proposal and share your feedback.

Thanks,
Ismael

[1] http://search-hadoop.com/m/Kafka/uyzND1oIhV61GS5Sf2


[jira] [Created] (KAFKA-4730) Streams does not have an in-memory windowed store

2017-02-03 Thread Eno Thereska (JIRA)
Eno Thereska created KAFKA-4730:
---

 Summary: Streams does not have an in-memory windowed store
 Key: KAFKA-4730
 URL: https://issues.apache.org/jira/browse/KAFKA-4730
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Affects Versions: 0.10.2.0
Reporter: Eno Thereska
 Fix For: 0.10.3.0


Streams has windowed persistent stores (e.g., see PersistentKeyValueFactory 
interface with "windowed" method), however it does not allow for windowed 
in-memory stores (e.g., see InMemoryKeyValueFactory interface). 

In addition to the interface not allowing it, streams does not actually have an 
implementation of an in-memory windowed store.

The implications are that operations that require windowing cannot use 
in-memory stores. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-03 Thread Damian Guy
Hi Matthias,

It possibly doesn't make sense to disable it, but then i'm sure someone
will come up with a reason they don't want it!
I'm happy to change it such that the checkpoint interval must be > 0.

Cheers,
Damian

On Fri, 3 Feb 2017 at 01:29 Matthias J. Sax  wrote:

> Thanks Damian.
>
> One more question: "Checkpointing is disabled if the checkpoint interval
> is set to a value <=0."
>
>
> Does it make sense to disable check pointing? What's the tradeoff here?
>
>
> -Matthias
>
>
> On 2/2/17 1:51 AM, Damian Guy wrote:
> > Hi Matthias,
> >
> > Thanks for the comments.
> >
> > 1. TBD - i need to do some performance tests and try and work out a
> > sensible default.
> > 2. Yes, you are correct. It could be a multiple of the
> commit.interval.ms.
> > But, that would also mean if you change the commit interval - say you
> lower
> > it, then you might also need to change the checkpoint setting (i.e, you
> > still only want to checkpoint every n minutes).
> >
> > On Wed, 1 Feb 2017 at 23:46 Matthias J. Sax 
> wrote:
> >
> >> Thanks for the KIP Damian.
> >>
> >> I am wondering about two things:
> >>
> >> 1. what should be the default value for the new parameter?
> >> 2. why is the new parameter provided in ms?
> >>
> >> About (2): because
> >>
> >> "the minimum checkpoint interval will be the value of
> >> commit.interval.ms. In effect the actual checkpoint interval will be a
> >> multiple of the commit interval"
> >>
> >> it might be easier to just use an parameter that is "number-or-commit
> >> intervals".
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 2/1/17 7:29 AM, Damian Guy wrote:
> >>> Thanks for the comments Eno.
> >>> As for exactly once, i don't believe this matters as we are just
> >> restoring
> >>> the change-log, i.e, the result of the aggregations that previously ran
> >>> etc. So once initialized the state store will be in the same state as
> it
> >>> was before.
> >>> Having the checkpoint in a kafka topic is not ideal as the state is per
> >>> kafka streams instance. So each instance would need to start with a
> >> unique
> >>> id that is persistent.
> >>>
> >>> Cheers,
> >>> Damian
> >>>
> >>> On Wed, 1 Feb 2017 at 13:20 Eno Thereska 
> wrote:
> >>>
>  As a follow up to my previous comment, have you thought about writing
> >> the
>  checkpoint to a topic instead of a local file? That would have the
>  advantage that all metadata continues to be managed by Kafka, as well
> as
>  fit with EoS. The potential disadvantage would be a slower latency,
> >> however
>  if it is periodic as you mention, I'm not sure that would be a show
> >> stopper.
> 
>  Thanks
>  Eno
> > On 1 Feb 2017, at 12:58, Eno Thereska 
> wrote:
> >
> > Thanks Damian, this is a good idea and will reduce the restore time.
>  Looking forward, with exactly once and support for transactions in
> >> Kafka, I
>  believe we'll have to add some support for rolling back checkpoints,
> >> e.g.,
>  when a transaction is aborted. We need to be aware of that and ideally
>  anticipate a bit those needs in the KIP.
> >
> > Thanks
> > Eno
> >
> >
> >> On 1 Feb 2017, at 10:18, Damian Guy  wrote:
> >>
> >> Hi all,
> >>
> >> I would like to start the discussion on KIP-116:
> >>
> >>
> 
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-116+-+Add+State+Store+Checkpoint+Interval+Configuration
> >>
> >> Thanks,
> >> Damian
> >
> 
> 
> >>>
> >>
> >>
> >
>
>


[jira] [Created] (KAFKA-4729) Stores for kstream-kstream join cannot be in-memory

2017-02-03 Thread Eno Thereska (JIRA)
Eno Thereska created KAFKA-4729:
---

 Summary: Stores for kstream-kstream join cannot be in-memory
 Key: KAFKA-4729
 URL: https://issues.apache.org/jira/browse/KAFKA-4729
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.2.0
Reporter: Eno Thereska
 Fix For: 0.10.3.0


Whereas we can specify in the DSL that stores used for aggregates can be 
RocksDb-based on in-memory, we cannot do that for stores used for 
KStream-KStream joins. E.g., the join() methon in KStreamImpl.java creates two 
state stores and the user does not have the option of having them be in-memory:

StateStoreSupplier thisWindow =
createWindowedStateStore(windows, keySerde, lhsValueSerde, 
joinThisName + "-store");

StateStoreSupplier otherWindow =
createWindowedStateStore(windows, keySerde, otherValueSerde, 
joinOtherName + "-store");


Part of the problem is that for joins, stores are not exposed to the user. We 
might want to rethink that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4728) KafkaConsumer#commitSync should clone its input

2017-02-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851276#comment-15851276
 ] 

ASF GitHub Bot commented on KAFKA-4728:
---

GitHub user je-ik opened a pull request:

https://github.com/apache/kafka/pull/2491

 [KAFKA-4728] KafkaConsumer#commitSync should clone its input



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/je-ik/kafka KAFKA-4728

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2491.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2491


commit 3f0bf0e5c24e9c4a4c4149c14438b2f95f870b32
Author: Jan Lukavsky 
Date:   2017-02-03T09:52:18Z

 [KAFKA-4728] KafkaConsumer#commitSync should clone its input




> KafkaConsumer#commitSync should clone its input 
> 
>
> Key: KAFKA-4728
> URL: https://issues.apache.org/jira/browse/KAFKA-4728
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
>Reporter: Jan Lukavský
> Attachments: KAFKA-4728.patch
>
>
> The `commitSync` method should clone input map to prevent live-lock in 
> situation when user code uses synchronization on the map. Cloning the map 
> prevents the consumer of the response to the `commitSync` request to block 
> (this might be different thread than the original issuer of the request).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2491: [KAFKA-4728] KafkaConsumer#commitSync should clone...

2017-02-03 Thread je-ik
GitHub user je-ik opened a pull request:

https://github.com/apache/kafka/pull/2491

 [KAFKA-4728] KafkaConsumer#commitSync should clone its input



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/je-ik/kafka KAFKA-4728

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2491.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2491


commit 3f0bf0e5c24e9c4a4c4149c14438b2f95f870b32
Author: Jan Lukavsky 
Date:   2017-02-03T09:52:18Z

 [KAFKA-4728] KafkaConsumer#commitSync should clone its input




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4728) KafkaConsumer#commitSync should clone its input

2017-02-03 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851268#comment-15851268
 ] 

Ismael Juma commented on KAFKA-4728:


Can you do a pull request as described here 
https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes ? 
Thanks!

> KafkaConsumer#commitSync should clone its input 
> 
>
> Key: KAFKA-4728
> URL: https://issues.apache.org/jira/browse/KAFKA-4728
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
>Reporter: Jan Lukavský
> Attachments: KAFKA-4728.patch
>
>
> The `commitSync` method should clone input map to prevent live-lock in 
> situation when user code uses synchronization on the map. Cloning the map 
> prevents the consumer of the response to the `commitSync` request to block 
> (this might be different thread than the original issuer of the request).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4728) KafkaConsumer#commitSync should clone its input

2017-02-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KAFKA-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavský updated KAFKA-4728:

Status: Patch Available  (was: Open)

Patch available for this issue.

> KafkaConsumer#commitSync should clone its input 
> 
>
> Key: KAFKA-4728
> URL: https://issues.apache.org/jira/browse/KAFKA-4728
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
>Reporter: Jan Lukavský
> Attachments: KAFKA-4728.patch
>
>
> The `commitSync` method should clone input map to prevent live-lock in 
> situation when user code uses synchronization on the map. Cloning the map 
> prevents the consumer of the response to the `commitSync` request to block 
> (this might be different thread than the original issuer of the request).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4728) KafkaConsumer#commitSync should clone its input

2017-02-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KAFKA-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavský updated KAFKA-4728:

Attachment: KAFKA-4728.patch

Attaching patch for this.

> KafkaConsumer#commitSync should clone its input 
> 
>
> Key: KAFKA-4728
> URL: https://issues.apache.org/jira/browse/KAFKA-4728
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
>Reporter: Jan Lukavský
> Attachments: KAFKA-4728.patch
>
>
> The `commitSync` method should clone input map to prevent live-lock in 
> situation when user code uses synchronization on the map. Cloning the map 
> prevents the consumer of the response to the `commitSync` request to block 
> (this might be different thread than the original issuer of the request).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4728) KafkaConsumer#commitSync should clone its input

2017-02-03 Thread JIRA
Jan Lukavský created KAFKA-4728:
---

 Summary: KafkaConsumer#commitSync should clone its input 
 Key: KAFKA-4728
 URL: https://issues.apache.org/jira/browse/KAFKA-4728
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.1.1
Reporter: Jan Lukavský


The `commitSync` method should clone input map to prevent live-lock in 
situation when user code uses synchronization on the map. Cloning the map 
prevents the consumer of the response to the `commitSync` request to block 
(this might be different thread than the original issuer of the request).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4276) REST configuration not visible in connector properties config files

2017-02-03 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851233#comment-15851233
 ] 

Ewen Cheslack-Postava commented on KAFKA-4276:
--

[~akhilesh_naidu] -- I set you assignee for this ticket.

Yes, the changes should be in the second file. That's the one that is shipped 
with the release. The former is just for unit tests.

For the second question, you might want to include both versions. The ones in 
a) are what are provided to other workers, i.e. URLs that are routable from 
other servers. The ones in b) are used to bind to the interface used to listen 
to requests. In some situations, the naming of host/port used to bind to 
interfaces and the naming used to connect externally can be different (e.g. in 
AWS, in docker containers, etc).

And for your final question, we can comment them out or just include the 
default values (which would have been used anyway). Either works fine as long 
as we include comments explaining the details of the configuration values.

> REST configuration not visible in connector properties config files
> ---
>
> Key: KAFKA-4276
> URL: https://issues.apache.org/jira/browse/KAFKA-4276
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Akhilesh Naidu
>  Labels: newbie
>
> REST host and port configs are not visible in connect-distributed.properties. 
> I think this leads to some confusion as users don't know there's even a REST 
> port and need to read the docs to find about it and the default (and these 
> are marked as LOW configs).
> We can easily improve that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4276) REST configuration not visible in connector properties config files

2017-02-03 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava reassigned KAFKA-4276:


Assignee: Akhilesh Naidu

> REST configuration not visible in connector properties config files
> ---
>
> Key: KAFKA-4276
> URL: https://issues.apache.org/jira/browse/KAFKA-4276
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Akhilesh Naidu
>  Labels: newbie
>
> REST host and port configs are not visible in connect-distributed.properties. 
> I think this leads to some confusion as users don't know there's even a REST 
> port and need to read the docs to find about it and the default (and these 
> are marked as LOW configs).
> We can easily improve that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-4276) REST configuration not visible in connector properties config files

2017-02-03 Thread Akhilesh Naidu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851200#comment-15851200
 ] 

Akhilesh Naidu edited comment on KAFKA-4276 at 2/3/17 8:04 AM:
---

Hi [~gwenshap] / [~ewencp],
I would like to work on this issue and had some questions.

The file 'connect-distributed.properties' is present in two locations
 i) kafka/tests/kafkatest/tests/connect/templates/connect-distributed.properties
ii) kafka/config/connect-distributed.properties
I'm assuming the changes need to be performed in the second file.

Also the docs contain two parameters each for host and port
a) rest.advertised.host.name  &  rest.advertised.port
b) rest.host.name  &  rest.port
Here also assuming, you mean the second set of properties

And as i presume, these vales have to be set as commented out 
e.g.
#rest.host.name=""
#rest.port=8083


was (Author: akhilesh_naidu):
Hi Gwen,
I would like to work on this issue and had some questions.

The file 'connect-distributed.properties' is present in two locations
 i) kafka/tests/kafkatest/tests/connect/templates/connect-distributed.properties
ii) kafka/config/connect-distributed.properties
I'm assuming the changes need to be performed in the second file.

Also the docs contain two parameters each for host and port
a) rest.advertised.host.name  &  rest.advertised.port
b) rest.host.name  &  rest.port
Here also assuming, you mean the second set of properties

And as i presume, these vales have to be set as commented out 
e.g.
#rest.host.name=""
#rest.port=8083

> REST configuration not visible in connector properties config files
> ---
>
> Key: KAFKA-4276
> URL: https://issues.apache.org/jira/browse/KAFKA-4276
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>  Labels: newbie
>
> REST host and port configs are not visible in connect-distributed.properties. 
> I think this leads to some confusion as users don't know there's even a REST 
> port and need to read the docs to find about it and the default (and these 
> are marked as LOW configs).
> We can easily improve that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Trying to understand design decision about producer ack and min.insync.replicas

2017-02-03 Thread Ismael Juma
I'd be in favour too.

Ismael

On 3 Feb 2017 7:33 am, "Ewen Cheslack-Postava"  wrote:

> On Thu, Feb 2, 2017 at 11:21 PM, James Cheng  wrote:
>
> > Ewen,
> >
> > Ah right, that's a good point.
> >
> > My initial reaction to your examples was that "well, those should be in
> > separate topics", but then I realized that people choose their topics
> for a
> > variety of reasons. Sometimes they organize it based on their producers,
> > sometimes they organize it based on the nature of the data, but sometimes
> > (as you gave examples about), they may organize it based on the consuming
> > application. And there are valid reason to want different data types in a
> > single topic:
> >
> > 1) You get global ordering
> > 2) You get persistent ordering in the case of re-reads (where as reading
> 2
> > topics would cause different ordering upon re-reads.)
> > 3) Logically-related data types all co-located.
> >
> > I do still think it'd be convenient to only have to set
> > min.insync.replicas on a topic and not have to require producing
> > applications to also set acks=all. It'd then be a single thing you have
> to
> > configure, instead of the current 2 things. (since, as currently
> > implemented, you have to set both things, in order to achieve high
> > durability.)
> >
>
> I entirely agree, I think the default should be acks=all and then this
> would be true :) Similar to the unclean leader election setting, I think
> defaulting to durable by default is a better choice. I understand
> historically why a different choice was made (Kafka didn't start out as a
> replicated, durable storage system), but given how it has evolved I think
> durable by default would be a better choice on both the broker and
> producer.
>
>
> >
> > But I admit that it's hard to find the balance of features/simplicity/
> complexity,
> > to handle all the use cases.
> >
>
> Perhaps the KIP-106 adjustment to unclean leader election could benefit
> from a sister KIP for adjusting the default producer acks setting?
>
> Not sure how popular it would be, but I would be in favor.
>
> -Ewen
>
>
> >
> > Thanks,
> > -James
> >
> > > On Feb 2, 2017, at 9:42 PM, Ewen Cheslack-Postava 
> > wrote:
> > >
> > > James,
> > >
> > > Great question, I probably should have been clearer. log data is an
> > example
> > > where the app (or even instance of the app) might know best what the
> > right
> > > tradeoff is. Depending on your strategy for managing logs, you may or
> may
> > > not be mixing multiple logs (and logs from different deployments) into
> > the
> > > same topic. For example, if you key by application, then you have an
> easy
> > > way to split logs up while still getting a global feed of log messages.
> > > Maybe logs from one app are really critical and we want to retry, but
> > from
> > > another app are just a nice to have.
> > >
> > > There are other examples even within a single app. For example, a
> gaming
> > > company might report data from a user of a game to the same topic but
> > want
> > > 2 producers with different reliability levels (and possibly where the
> > > ordering constraints across the two sets that might otherwise cause you
> > to
> > > use a single consumer are not an issue). High frequency telemetry on a
> > > player might be desirable to have, but not the end of the world if some
> > is
> > > lost. In contrast, they may want a stronger guarantee for, e.g.,
> sometime
> > > like chat messages, where they want to have a permanent record of them
> in
> > > all circumstances.
> > >
> > > -Ewen
> > >
> > > On Fri, Jan 27, 2017 at 12:59 AM, James Cheng 
> > wrote:
> > >
> > >>
> > >>> On Jan 27, 2017, at 12:18 AM, Ewen Cheslack-Postava <
> e...@confluent.io
> > >
> > >> wrote:
> > >>>
> > >>> On Thu, Jan 26, 2017 at 4:23 PM, Luciano Afranllie <
> > >> listas.luaf...@gmail.com
> >  wrote:
> > >>>
> >  I was thinking about the situation where you have less brokers in
> the
> > >> ISR
> >  list than the number set in min.insync.replicas.
> > 
> >  My idea was that if I, as an administrator, for a given topic, want
> to
> >  favor durability over availability, then if that topic has less ISR
> > than
> >  the value set in min.insync.replicas I may want to stop producing to
> > the
> >  topic. In the way min.insync.replicas and ack work, I need to
> > coordinate
> >  with all producers in order to achieve this. There is no way (or I
> > don't
> >  know it) to globally enforce stop producing to a topic if it is
> under
> >  replicated.
> > 
> >  I don't see why, for the same topic, some producers might want get
> an
> > >> error
> >  when the number of ISR is below min.insync.replicas while other
> > >> producers
> >  don't. I think it could be more useful to be able to set that ALL
> > >> producers
> >  should get an error when a given topic is under replicated so they
> > stop
> > 

[jira] [Commented] (KAFKA-4276) REST configuration not visible in connector properties config files

2017-02-03 Thread Akhilesh Naidu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851200#comment-15851200
 ] 

Akhilesh Naidu commented on KAFKA-4276:
---

Hi Gwen,
I would like to work on this issue and had some questions.

The file 'connect-distributed.properties' is present in two locations
 i) kafka/tests/kafkatest/tests/connect/templates/connect-distributed.properties
ii) kafka/config/connect-distributed.properties
I'm assuming the changes need to be performed in the second file.

Also the docs contain two parameters each for host and port
a) rest.advertised.host.name  &  rest.advertised.port
b) rest.host.name  &  rest.port
Here also assuming, you mean the second set of properties

And as i presume, these vales have to be set as commented out 
e.g.
#rest.host.name=""
#rest.port=8083

> REST configuration not visible in connector properties config files
> ---
>
> Key: KAFKA-4276
> URL: https://issues.apache.org/jira/browse/KAFKA-4276
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>  Labels: newbie
>
> REST host and port configs are not visible in connect-distributed.properties. 
> I think this leads to some confusion as users don't know there's even a REST 
> port and need to read the docs to find about it and the default (and these 
> are marked as LOW configs).
> We can easily improve that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)