Re: [VOTE] 0.10.0.1 RC0

2016-07-29 Thread Matthias J. Sax
Hi,

I think we need a new RC because of

https://issues.apache.org/jira/browse/KAFKA-4008

-Matthias


On 07/29/2016 04:59 PM, Harsha Chintalapani wrote:
> Hi Ismael,
>  I would like this JIRA to be included in the minor release
> https://issues.apache.org/jira/browse/KAFKA-3950
> Thanks,
> Harsha
> On Fri, Jul 29, 2016 at 7:46 AM Ismael Juma  wrote:
> 
>> Hello Kafka users, developers and client-developers,
>>
>> This is the first candidate for the release of Apache Kafka 0.10.0.1. This
>> is a bug fix release and it includes fixes and improvements from 50 JIRAs
>> (including a few critical bugs). See the release notes for more details:
>>
>> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/RELEASE_NOTES.html
>>
>> *** Please download, test and vote by Monday, 1 August, 8am PT ***
>>
>> Kafka's KEYS file containing PGP keys we use to sign the release:
>> http://kafka.apache.org/KEYS
>>
>> * Release artifacts to be voted upon (source and binary):
>> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/
>>
>> * Maven artifacts to be voted upon:
>> https://repository.apache.org/content/groups/staging
>>
>> * Javadoc:
>> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/javadoc/
>>
>> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
>>
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=0c2322c2cf7ab7909cfd8b834d1d2fffc34db109
>>
>> * Documentation:
>> http://kafka.apache.org/0100/documentation.html
>>
>> * Protocol:
>> http://kafka.apache.org/0100/protocol.html
>>
>> * Successful Jenkins builds for the 0.10.0 branch:
>> Unit/integration tests:
>> https://builds.apache.org/job/kafka-0.10.0-jdk7/170/
>> System tests:
>> https://jenkins.confluent.io/job/system-test-kafka-0.10.0/130/
>>
>> Thanks,
>> Ismael
>>
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (KAFKA-4008) Module "tools" should ne be dependent on "core"

2016-07-29 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-4008:
--

 Summary: Module "tools" should ne be dependent on "core"
 Key: KAFKA-4008
 URL: https://issues.apache.org/jira/browse/KAFKA-4008
 Project: Kafka
  Issue Type: Bug
  Components: core, tools
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Critical
 Fix For: 0.10.0.1


The newly introduced "Stream Application Reset Tool" added the dependency to 
{{core}} into module {{tools}}. We want to get rid of this dependency.
Solution: move {{StreamsResetter}} into module {{core}}

Remark: actually, {{StreamsResetter}} should be in module {{streams}} however, 
this change is blocked by KIP-4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3950) kafka mirror maker tool is not respecting whitelist option

2016-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400428#comment-15400428
 ] 

ASF GitHub Bot commented on KAFKA-3950:
---

GitHub user omkreddy opened a pull request:

https://github.com/apache/kafka/pull/1687

KAFKA-3950: kafka mirror maker tool is not respecting whitelist option



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omkreddy/kafka KAFKA-3950

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1687.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1687


commit f43d29418fbfb79d40acbda7ad2ec18dd57bcac7
Author: Manikumar Reddy O 
Date:   2016-07-29T17:39:59Z

KAFKA-3950: kafka mirror maker tool is not respecting whitelist option




> kafka mirror maker tool is not respecting whitelist option
> --
>
> Key: KAFKA-3950
> URL: https://issues.apache.org/jira/browse/KAFKA-3950
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raghav Kumar Gautam
>Assignee: Manikumar Reddy
>Priority: Critical
>
> A mirror maker launched like this:
> {code}
> /usr/bin/kinit -k -t /home/kfktest/hadoopqa/keytabs/kfktest.headless.keytab 
> kfkt...@example.com
> JAVA_HOME=/usr/jdk64/jdk1.8.0_77 JMX_PORT=9112 
> /usr/kafka/bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config 
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/config/mirror_consumer_12.properties
>  --producer.config 
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/config/mirror_producer_12.properties
>  --new.consumer --whitelist="test.*" >>  
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/logs/mirror_maker-12/mirror_maker_12.log
>  2>&1 & echo pid:$! >  
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/logs/mirror_maker-12/entity_12_pid
> {code}
> Lead to TopicAuthorizationException:
> {code}
> WARN Error while fetching metadata with correlation id 44 : 
> {__consumer_offsets=TOPIC_AUTHORIZATION_FAILED} 
> (org.apache.kafka.clients.NetworkClient)
> [2016-06-20 13:24:49,983] FATAL [mirrormaker-thread-0] Mirror maker thread 
> failure due to  (kafka.tools.MirrorMaker$MirrorMakerThread)
> org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to 
> access topics: [__consumer_offsets]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1687: KAFKA-3950: kafka mirror maker tool is not respect...

2016-07-29 Thread omkreddy
GitHub user omkreddy opened a pull request:

https://github.com/apache/kafka/pull/1687

KAFKA-3950: kafka mirror maker tool is not respecting whitelist option



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omkreddy/kafka KAFKA-3950

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1687.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1687


commit f43d29418fbfb79d40acbda7ad2ec18dd57bcac7
Author: Manikumar Reddy O 
Date:   2016-07-29T17:39:59Z

KAFKA-3950: kafka mirror maker tool is not respecting whitelist option




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3950) kafka mirror maker tool is not respecting whitelist option

2016-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400419#comment-15400419
 ] 

ASF GitHub Bot commented on KAFKA-3950:
---

Github user omkreddy closed the pull request at:

https://github.com/apache/kafka/pull/1615


> kafka mirror maker tool is not respecting whitelist option
> --
>
> Key: KAFKA-3950
> URL: https://issues.apache.org/jira/browse/KAFKA-3950
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raghav Kumar Gautam
>Assignee: Manikumar Reddy
>Priority: Critical
>
> A mirror maker launched like this:
> {code}
> /usr/bin/kinit -k -t /home/kfktest/hadoopqa/keytabs/kfktest.headless.keytab 
> kfkt...@example.com
> JAVA_HOME=/usr/jdk64/jdk1.8.0_77 JMX_PORT=9112 
> /usr/kafka/bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config 
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/config/mirror_consumer_12.properties
>  --producer.config 
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/config/mirror_producer_12.properties
>  --new.consumer --whitelist="test.*" >>  
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/logs/mirror_maker-12/mirror_maker_12.log
>  2>&1 & echo pid:$! >  
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/logs/mirror_maker-12/entity_12_pid
> {code}
> Lead to TopicAuthorizationException:
> {code}
> WARN Error while fetching metadata with correlation id 44 : 
> {__consumer_offsets=TOPIC_AUTHORIZATION_FAILED} 
> (org.apache.kafka.clients.NetworkClient)
> [2016-06-20 13:24:49,983] FATAL [mirrormaker-thread-0] Mirror maker thread 
> failure due to  (kafka.tools.MirrorMaker$MirrorMakerThread)
> org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to 
> access topics: [__consumer_offsets]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1615: KAFKA-3950: kafka mirror maker tool is not respect...

2016-07-29 Thread omkreddy
Github user omkreddy closed the pull request at:

https://github.com/apache/kafka/pull/1615


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk7 #1451

2016-07-29 Thread Apache Jenkins Server
See 

Changes:

[cshapi] KAFKA-3946: Protocol guide should say that Produce request acks can 
o…

--
[...truncated 3580 lines...]
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.FailureHandlingDispatch.dispatch(FailureHandlingDispatch.java:29)
at 
org.gradle.internal.dispatch.AsyncDispatch.dispatchMessages(AsyncDispatch.java:132)
at 
org.gradle.internal.dispatch.AsyncDispatch.access$000(AsyncDispatch.java:33)
at 
org.gradle.internal.dispatch.AsyncDispatch$1.run(AsyncDispatch.java:72)
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
at 
org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
unable to create new native thread
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:713)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371)
at 
org.gradle.internal.concurrent.StoppableExecutorImpl.execute(StoppableExecutorImpl.java:36)
at 
org.gradle.internal.remote.internal.inet.TcpIncomingConnector.accept(TcpIncomingConnector.java:69)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedServer.accept(MessageHubBackedServer.java:37)
at 
org.gradle.process.internal.worker.DefaultWorkerProcessBuilder.build(DefaultWorkerProcessBuilder.java:146)
at 
org.gradle.api.internal.tasks.testing.worker.ForkingTestClassProcessor.forkProcess(ForkingTestClassProcessor.java:78)
at 
org.gradle.api.internal.tasks.testing.worker.ForkingTestClassProcessor.processTestClass(ForkingTestClassProcessor.java:64)
at 
org.gradle.api.internal.tasks.testing.processors.RestartEveryNTestClassProcessor.processTestClass(RestartEveryNTestClassProcessor.java:47)
at sun.reflect.GeneratedMethodAccessor312.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.FailureHandlingDispatch.dispatch(FailureHandlingDispatch.java:29)
at 
org.gradle.internal.dispatch.AsyncDispatch.dispatchMessages(AsyncDispatch.java:132)
at 
org.gradle.internal.dispatch.AsyncDispatch.access$000(AsyncDispatch.java:33)
at 
org.gradle.internal.dispatch.AsyncDispatch$1.run(AsyncDispatch.java:72)
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
at 
org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
unable to create new native thread
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:713)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371)
at 
org.gradle.internal.concurrent.StoppableExecutorImpl.execute(StoppableExecutorImpl.java:36)
at 
org.gradle.internal.remote.internal.inet.TcpIncomingConnector.accept(TcpIncomingConnector.java:69)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedServer.accept(MessageHubBackedServer.java:37)
at 
org.gradle.process.internal.worker.DefaultWorkerProcessBuilder.build(DefaultWorkerProcessBuilder.java:146)
at 
org.gradle.api.internal.tasks.testing.worker.ForkingTestClassProcessor.forkProcess(ForkingTestClassProcessor.java:78)
at 
org.gradle.api.internal.tasks.testing.worker.ForkingTestClassProcessor.processTestClass(ForkingTestClassProcessor.java:64)
at 
org.gradle.api.internal.tasks.testing.processors.RestartEveryNTestClassProcessor.processTestClass(RestartEveryNTestClassProcessor.java:47)
   

Build failed in Jenkins: kafka-trunk-jdk8 #786

2016-07-29 Thread Apache Jenkins Server
See 

Changes:

[cshapi] KAFKA-3946: Protocol guide should say that Produce request acks can o…

--
[...truncated 1368 lines...]
kafka.server.EdgeCaseRequestTest > testInvalidApiVersionRequest PASSED

kafka.server.EdgeCaseRequestTest > testMalformedHeaderRequest STARTED

kafka.server.EdgeCaseRequestTest > testMalformedHeaderRequest PASSED

kafka.server.EdgeCaseRequestTest > testProduceRequestWithNullClientId STARTED

kafka.server.EdgeCaseRequestTest > testProduceRequestWithNullClientId PASSED

kafka.server.EdgeCaseRequestTest > testInvalidApiKeyRequest STARTED

kafka.server.EdgeCaseRequestTest > testInvalidApiKeyRequest PASSED

kafka.server.EdgeCaseRequestTest > testHeaderOnlyRequest STARTED

kafka.server.EdgeCaseRequestTest > testHeaderOnlyRequest PASSED

kafka.server.CreateTopicsRequestTest > testValidCreateTopicsRequests STARTED

kafka.server.CreateTopicsRequestTest > testValidCreateTopicsRequests PASSED

kafka.server.CreateTopicsRequestTest > testErrorCreateTopicsRequests STARTED

kafka.server.CreateTopicsRequestTest > testErrorCreateTopicsRequests PASSED

kafka.server.CreateTopicsRequestTest > testInvalidCreateTopicsRequests STARTED

kafka.server.CreateTopicsRequestTest > testInvalidCreateTopicsRequests PASSED

kafka.server.CreateTopicsRequestTest > testNotController STARTED

kafka.server.CreateTopicsRequestTest > testNotController PASSED

kafka.server.OffsetCommitTest > testUpdateOffsets STARTED

kafka.server.OffsetCommitTest > testUpdateOffsets PASSED

kafka.server.OffsetCommitTest > testLargeMetadataPayload STARTED

kafka.server.OffsetCommitTest > testLargeMetadataPayload PASSED

kafka.server.OffsetCommitTest > testCommitAndFetchOffsets STARTED

kafka.server.OffsetCommitTest > testCommitAndFetchOffsets PASSED

kafka.server.OffsetCommitTest > testOffsetExpiration STARTED

kafka.server.OffsetCommitTest > testOffsetExpiration PASSED

kafka.server.OffsetCommitTest > testNonExistingTopicOffsetCommit STARTED

kafka.server.OffsetCommitTest > testNonExistingTopicOffsetCommit PASSED

kafka.server.ReplicaManagerTest > testHighWaterMarkDirectoryMapping STARTED

kafka.server.ReplicaManagerTest > testHighWaterMarkDirectoryMapping PASSED

kafka.server.ReplicaManagerTest > 
testFetchBeyondHighWatermarkReturnEmptyResponse STARTED

kafka.server.ReplicaManagerTest > 
testFetchBeyondHighWatermarkReturnEmptyResponse PASSED

kafka.server.ReplicaManagerTest > testIllegalRequiredAcks STARTED

kafka.server.ReplicaManagerTest > testIllegalRequiredAcks PASSED

kafka.server.ReplicaManagerTest > testClearPurgatoryOnBecomingFollower STARTED

kafka.server.ReplicaManagerTest > testClearPurgatoryOnBecomingFollower PASSED

kafka.server.ReplicaManagerTest > testHighwaterMarkRelativeDirectoryMapping 
STARTED

kafka.server.ReplicaManagerTest > testHighwaterMarkRelativeDirectoryMapping 
PASSED

kafka.server.IsrExpirationTest > testIsrExpirationForSlowFollowers STARTED

kafka.server.IsrExpirationTest > testIsrExpirationForSlowFollowers PASSED

kafka.server.IsrExpirationTest > testIsrExpirationForStuckFollowers STARTED

kafka.server.IsrExpirationTest > testIsrExpirationForStuckFollowers PASSED

kafka.server.IsrExpirationTest > testIsrExpirationIfNoFetchRequestMade STARTED

kafka.server.IsrExpirationTest > testIsrExpirationIfNoFetchRequestMade PASSED

kafka.server.ControlledShutdownLeaderSelectorTest > testSelectLeader STARTED

kafka.server.ControlledShutdownLeaderSelectorTest > testSelectLeader PASSED

kafka.server.ClientQuotaManagerTest > testQuotaViolation STARTED

kafka.server.ClientQuotaManagerTest > testQuotaViolation PASSED

kafka.server.ClientQuotaManagerTest > testExpireQuotaSensors STARTED

kafka.server.ClientQuotaManagerTest > testExpireQuotaSensors PASSED

kafka.server.ClientQuotaManagerTest > testExpireThrottleTimeSensor STARTED

kafka.server.ClientQuotaManagerTest > testExpireThrottleTimeSensor PASSED

kafka.server.ClientQuotaManagerTest > testQuotaParsing STARTED

kafka.server.ClientQuotaManagerTest > testQuotaParsing PASSED

kafka.server.MetadataCacheTest > 
getTopicMetadataWithNonSupportedSecurityProtocol STARTED

kafka.server.MetadataCacheTest > 
getTopicMetadataWithNonSupportedSecurityProtocol PASSED

kafka.server.MetadataCacheTest > getTopicMetadataIsrNotAvailable STARTED

kafka.server.MetadataCacheTest > getTopicMetadataIsrNotAvailable PASSED

kafka.server.MetadataCacheTest > getTopicMetadata STARTED

kafka.server.MetadataCacheTest > getTopicMetadata PASSED

kafka.server.MetadataCacheTest > getTopicMetadataReplicaNotAvailable STARTED

kafka.server.MetadataCacheTest > getTopicMetadataReplicaNotAvailable PASSED

kafka.server.MetadataCacheTest > getTopicMetadataPartitionLeaderNotAvailable 
STARTED

kafka.server.MetadataCacheTest > getTopicMetadataPartitionLeaderNotAvailable 
PASSED

kafka.server.MetadataCacheTest > getAliveBrokersShouldNotBeMutatedByUpdateCache 
STARTED

kafka.server.MetadataCacheTest > getAliveBrokersSho

[jira] [Commented] (KAFKA-2063) Bound fetch response size

2016-07-29 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400224#comment-15400224
 ] 

Jay Kreps commented on KAFKA-2063:
--

Makes sense on both counts--I have no real opinion, just thought I'd throw that 
out there.

> Bound fetch response size
> -
>
> Key: KAFKA-2063
> URL: https://issues.apache.org/jira/browse/KAFKA-2063
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> Currently the only bound on the fetch response size is 
> max.partition.fetch.bytes * num_partitions. There are two problems:
> 1. First this bound is often large. You may chose 
> max.partition.fetch.bytes=1MB to enable messages of up to 1MB. However if you 
> also need to consume 1k partitions this means you may receive a 1GB response 
> in the worst case!
> 2. The actual memory usage is unpredictable. Partition assignment changes, 
> and you only actually get the full fetch amount when you are behind and there 
> is a full chunk of data ready. This means an application that seems to work 
> fine will suddenly OOM when partitions shift or when the application falls 
> behind.
> We need to decouple the fetch response size from the number of partitions.
> The proposal for doing this would be to add a new field to the fetch request, 
> max_bytes which would control the maximum data bytes we would include in the 
> response.
> The implementation on the server side would grab data from each partition in 
> the fetch request until it hit this limit, then send back just the data for 
> the partitions that fit in the response. The implementation would need to 
> start from a random position in the list of topics included in the fetch 
> request to ensure that in a case of backlog we fairly balance between 
> partitions (to avoid first giving just the first partition until that is 
> exhausted, then the next partition, etc).
> This setting will make the max.partition.fetch.bytes field in the fetch 
> request much less useful and we  should discuss just getting rid of it.
> I believe this also solves the same thing we were trying to address in 
> KAFKA-598. The max_bytes setting now becomes the new limit that would need to 
> be compared to max_message size. This can be much larger--e.g. setting a 50MB 
> max_bytes setting would be okay, whereas now if you set 50MB you may need to 
> allocate 50MB*num_partitions.
> This will require evolving the fetch request protocol version to add the new 
> field and we should do a KIP for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1686: KAFKA-4002: task.open() should be invoked in case ...

2016-07-29 Thread Ishiihara
GitHub user Ishiihara opened a pull request:

https://github.com/apache/kafka/pull/1686

KAFKA-4002: task.open() should be invoked in case that 0 partitions is 
assigned to task



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ishiihara/kafka open-partition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1686.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1686


commit 1575cacb89c549544a0b02a7108ee5e069264784
Author: Liquan Pei 
Date:   2016-07-29T23:13:49Z

Call task.open() with empty partitions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4002) task.open() should be invoked in case that 0 partitions is assigned to task.

2016-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400206#comment-15400206
 ] 

ASF GitHub Bot commented on KAFKA-4002:
---

GitHub user Ishiihara opened a pull request:

https://github.com/apache/kafka/pull/1686

KAFKA-4002: task.open() should be invoked in case that 0 partitions is 
assigned to task



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ishiihara/kafka open-partition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1686.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1686


commit 1575cacb89c549544a0b02a7108ee5e069264784
Author: Liquan Pei 
Date:   2016-07-29T23:13:49Z

Call task.open() with empty partitions




> task.open() should be invoked in case that 0 partitions is assigned to task. 
> -
>
> Key: KAFKA-4002
> URL: https://issues.apache.org/jira/browse/KAFKA-4002
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Liquan Pei
>Assignee: Liquan Pei
> Fix For: 0.11.0.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
>  In case that 0 partitions is assigned to a task, the open() call in task was 
> not invoked, but then put() was called later. The put() call with empty data 
> is to hand control to the task so that it can continue working on the 
> buffered data.  
> If task.open() is not invoked in case of 0 partitions are assigned, connector 
> developers needs to do some special handling in this case, i.e. do not call 
> any method in writer to avoid null pointer exceptions. To make the connector 
> developers' life easy, it probably better to change the behavior so the call 
> is made even 0 partitions are assigned .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3946) Protocol guide should say that Produce request acks can only be 0, 1, or -1

2016-07-29 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved KAFKA-3946.
-
   Resolution: Fixed
Fix Version/s: 0.10.1.0

Issue resolved by pull request 1680
[https://github.com/apache/kafka/pull/1680]

> Protocol guide should say that Produce request acks can only be 0, 1, or -1
> ---
>
> Key: KAFKA-3946
> URL: https://issues.apache.org/jira/browse/KAFKA-3946
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>Assignee: Mickael Maison
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> The protocol guide at http://kafka.apache.org/protocol.html#protocol_messages 
> says that for Produce requests, acks means:
> The number of nodes that should replicate the produce before returning. -1 
> indicates the full ISR.
> This seems to imply that you can specify values of 2,3,4, etc.
> It would be clearer if the description was more explicit. It should say that 
> the only valid values are 0, 1, and -1, per the code at 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L382-L384



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3946) Protocol guide should say that Produce request acks can only be 0, 1, or -1

2016-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400176#comment-15400176
 ] 

ASF GitHub Bot commented on KAFKA-3946:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1680


> Protocol guide should say that Produce request acks can only be 0, 1, or -1
> ---
>
> Key: KAFKA-3946
> URL: https://issues.apache.org/jira/browse/KAFKA-3946
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>Assignee: Mickael Maison
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> The protocol guide at http://kafka.apache.org/protocol.html#protocol_messages 
> says that for Produce requests, acks means:
> The number of nodes that should replicate the produce before returning. -1 
> indicates the full ISR.
> This seems to imply that you can specify values of 2,3,4, etc.
> It would be clearer if the description was more explicit. It should say that 
> the only valid values are 0, 1, and -1, per the code at 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L382-L384



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1680: KAFKA-3946: Protocol guide should say that Produce...

2016-07-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1680


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KAFKA-2063 Add possibility to bound fetch response size (was Re: [DISCUSS] Optimise memory used by replication process by using adaptive fetch message size)

2016-07-29 Thread Ben Stopford
Andrey - I’m not sure we quite have consensus on the Randomisation vs Round 
Robin issue but it’s probably worth you just raising a kip and put one of the 
options as a rejected alternative. 

B
> On 29 Jul 2016, at 11:59, Ben Stopford  wrote:
> 
> Thanks for the kicking this one off Andrey. Generally it looks great! 
> 
> I left a comment on the Jira regarding whether we should remove the existing 
> limitBytes, along with a potential alternative to doing randomisation. 
> 
> B
>> On 29 Jul 2016, at 09:17, Andrey L. Neporada  
>> wrote:
>> 
>> Hi all!
>> 
>> I would like to get your feedback on PR for bug KAFKA-2063.
>> Looks like KIP is needed there, but it would be nice to get feedback first.
>> 
>> Thanks,
>> Andrey.
>> 
>> 
>>> On 22 Jul 2016, at 12:26, Andrey L. Neporada  
>>> wrote:
>>> 
>>> Hi!
>>> 
>>> Thanks for feedback - I agree that the proper way to fix this issue is to 
>>> provide per-request data limit.
>>> Will try to do it.
>>> 
>>> Thanks,
>>> Andrey.
>>> 
>>> 
>>> 
 On 21 Jul 2016, at 18:57, Jay Kreps  wrote:
 
 I think the memory usage for consumers can be improved a lot, but I think
 there may be a better way then what you are proposing.
 
 The problem is exactly what you describe: the bound the user sets is
 per-partition, but the number of partitions may be quite high. The consumer
 could provide a bound on the response size by only requesting a subset of
 the partitions, but this would mean that if there was no data available on
 those partitions the consumer wouldn't be checking other partitions, which
 would add latency.
 
 I think the solution is to add a new "max response size" parameter to the
 fetch request so the server checks all partitions but doesn't send back
 more than this amount in total. This has to be done carefully to ensure
 fairness (i.e. if one partition has unbounded amounts of data it shouldn't
 indefinitely starve other partitions).
 
 This will fix memory management both in the replicas and for consumers.
 
 There is a JIRA for this: https://issues.apache.org/jira/browse/KAFKA-2063
 
 I think it isn't too hard to do and would be a huge aid to the memory
 profile of both the clients and server.
 
 I also don't think there is much use in setting a max size that expands
 dynamically since in any case you have to be able to support the maximum,
 so you might as well always use that rather than expanding and contracting
 dynamically. That is, if your max fetch response size is 64MB you need to
 budget 64MB of free memory, so making it smaller some of the time doesn't
 really help you.
 
 -Jay
 
 On Thu, Jul 21, 2016 at 2:49 AM, Andrey L. Neporada <
 anepor...@yandex-team.ru> wrote:
 
> Hi all!
> 
> We noticed that our Kafka cluster uses a lot of memory for replication.
> Our Kafka usage pattern is following:
> 
> 1. Most messages are small (tens or hundreds kilobytes at most), but some
> (rare) messages can be several megabytes.So, we have to set
> replica.fetch.max.bytes = max.message.bytes = 8MB
> 2. Each Kafka broker handles several thousands of partitions from multiple
> topics.
> 
> In this scenario total memory required for replication (i.e.
> replica.fetch.max.bytes * numOfPartitions) is unreasonably big.
> 
> So we would like to propose following approach to fix this problem:
> 
> 1. Introduce new config parameter replica.fetch.base.bytes - which is the
> initial size of replication data chunk. By default this parameter should 
> be
> equal to replica.fetch.max.bytes so the replication process will work as
> before.
> 
> 2. If the ReplicaFetcherThread fails when trying to replicate message
> bigger than current replication chunk, we increase it twofold (or up to
> replica.fetch.max.bytes, whichever is smaller) and retry.
> 
> 3. If the chunk is replicated successfully we try to decrease the size of
> replication chunk back to replica.fetch.base.bytes.
> 
> 
> By choosing replica.fetch.base.bytes in optimal way (in our case ~200K),
> we we able to significatly decrease memory usage without any noticeable
> impact on replication efficiency.
> 
> Here is JIRA ticket (with PR):
> https://issues.apache.org/jira/browse/KAFKA-3979
> 
> Your comments and feedback are highly appreciated!
> 
> 
> Thanks,
> Andrey.
>>> 
>> 
> 



[GitHub] kafka pull request #1685: hotfix: move streams application reset tool from t...

2016-07-29 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/1685

hotfix: move streams application reset tool from tools to core

=> required to remove core dependency from tools

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka moveResetTool

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1685.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1685


commit 858585e857cc163445142387f183cd5ca5418718
Author: Matthias J. Sax 
Date:   2016-07-29T22:34:18Z

hotfix: move streams application reset tool from tools to core
=> required to remove dependency on core from tools




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4000) Consumer per-topic metrics do not aggregate partitions from the same topic

2016-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400130#comment-15400130
 ] 

ASF GitHub Bot commented on KAFKA-4000:
---

GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/1684

KAFKA-4000: Aggregate partitions of each topic for consumer metrics

Fix the consumer metric collection to record metrics per topic instead of 
per partition of topic.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-4000

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1684


commit a69c8fe3d1a9d99310f99db4821a6d70cf06104b
Author: Vahid Hashemian 
Date:   2016-07-29T20:55:14Z

KAFKA-4000: Aggregate partitions of each topic for consumer metrics

Fix the consumer metric collection to record metrics per topic instead of 
per partition of topic.




> Consumer per-topic metrics do not aggregate partitions from the same topic
> --
>
> Key: KAFKA-4000
> URL: https://issues.apache.org/jira/browse/KAFKA-4000
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>Priority: Minor
>
> In the Consumer Fetcher code, we have per-topic fetch metrics, but they seem 
> to be computed from each partition separately. It seems like we should 
> aggregate them by topic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1684: KAFKA-4000: Aggregate partitions of each topic for...

2016-07-29 Thread vahidhashemian
GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/1684

KAFKA-4000: Aggregate partitions of each topic for consumer metrics

Fix the consumer metric collection to record metrics per topic instead of 
per partition of topic.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-4000

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1684


commit a69c8fe3d1a9d99310f99db4821a6d70cf06104b
Author: Vahid Hashemian 
Date:   2016-07-29T20:55:14Z

KAFKA-4000: Aggregate partitions of each topic for consumer metrics

Fix the consumer metric collection to record metrics per topic instead of 
per partition of topic.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-0.10.0-jdk7 #172

2016-07-29 Thread Apache Jenkins Server
See 



[jira] [Created] (KAFKA-4007) Improve fetch pipelining for low values of max.poll.records

2016-07-29 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-4007:
--

 Summary: Improve fetch pipelining for low values of 
max.poll.records
 Key: KAFKA-4007
 URL: https://issues.apache.org/jira/browse/KAFKA-4007
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Reporter: Jason Gustafson


Currently the consumer will only send a prefetch for a partition after all the 
records from the previous fetch have been consumed. This can lead to suboptimal 
pipelining when max.poll.records is set very low since the processing latency 
for a small set of records may be small compared to the latency of a fetch. An 
improvement suggested by [~junrao] is to send the fetch anyway even if we have 
unprocessed data buffered, but delay reading it from the socket until that data 
has been consumed. Potentially the consumer can delay reading _any_ pending 
fetch until it is ready to be returned to the user, which may help control 
memory better. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2063) Bound fetch response size

2016-07-29 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400037#comment-15400037
 ] 

Jason Gustafson commented on KAFKA-2063:


Also, I actually prefer [~benstopford]'s suggestion to make the fetching order 
deterministic according to the order of the partitions in the request. If we 
want randomization, it would be trivial to do that on the client side, but this 
would allow for potentially cleverer approaches such as using round-robin (as 
suggested).

> Bound fetch response size
> -
>
> Key: KAFKA-2063
> URL: https://issues.apache.org/jira/browse/KAFKA-2063
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> Currently the only bound on the fetch response size is 
> max.partition.fetch.bytes * num_partitions. There are two problems:
> 1. First this bound is often large. You may chose 
> max.partition.fetch.bytes=1MB to enable messages of up to 1MB. However if you 
> also need to consume 1k partitions this means you may receive a 1GB response 
> in the worst case!
> 2. The actual memory usage is unpredictable. Partition assignment changes, 
> and you only actually get the full fetch amount when you are behind and there 
> is a full chunk of data ready. This means an application that seems to work 
> fine will suddenly OOM when partitions shift or when the application falls 
> behind.
> We need to decouple the fetch response size from the number of partitions.
> The proposal for doing this would be to add a new field to the fetch request, 
> max_bytes which would control the maximum data bytes we would include in the 
> response.
> The implementation on the server side would grab data from each partition in 
> the fetch request until it hit this limit, then send back just the data for 
> the partitions that fit in the response. The implementation would need to 
> start from a random position in the list of topics included in the fetch 
> request to ensure that in a case of backlog we fairly balance between 
> partitions (to avoid first giving just the first partition until that is 
> exhausted, then the next partition, etc).
> This setting will make the max.partition.fetch.bytes field in the fetch 
> request much less useful and we  should discuss just getting rid of it.
> I believe this also solves the same thing we were trying to address in 
> KAFKA-598. The max_bytes setting now becomes the new limit that would need to 
> be compared to max_message size. This can be much larger--e.g. setting a 50MB 
> max_bytes setting would be okay, whereas now if you set 50MB you may need to 
> allocate 50MB*num_partitions.
> This will require evolving the fetch request protocol version to add the new 
> field and we should do a KIP for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2063) Bound fetch response size

2016-07-29 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1544#comment-1544
 ] 

Jason Gustafson commented on KAFKA-2063:


I think the problem with keeping the partition-level maximum is that you still 
have to set it with an idea about how many partitions will be included in the 
fetch request. That seems difficult for a user to do since it depends on the 
number of partitions the consumer has been assigned and how they're distributed 
among the brokers. It also seems like there's some risk that you may not be 
able to fill a fetch response completely. For example, if max.fetch.bytes is 
10MB, but max.partition.fetch.bytes is 1MB, then you'll only be able to fill a 
fetch response if you fetch more than 10 partitions. Is that a problem? Maybe, 
maybe not. I guess it depends on how well the consumer can pipeline fetching.

> Bound fetch response size
> -
>
> Key: KAFKA-2063
> URL: https://issues.apache.org/jira/browse/KAFKA-2063
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> Currently the only bound on the fetch response size is 
> max.partition.fetch.bytes * num_partitions. There are two problems:
> 1. First this bound is often large. You may chose 
> max.partition.fetch.bytes=1MB to enable messages of up to 1MB. However if you 
> also need to consume 1k partitions this means you may receive a 1GB response 
> in the worst case!
> 2. The actual memory usage is unpredictable. Partition assignment changes, 
> and you only actually get the full fetch amount when you are behind and there 
> is a full chunk of data ready. This means an application that seems to work 
> fine will suddenly OOM when partitions shift or when the application falls 
> behind.
> We need to decouple the fetch response size from the number of partitions.
> The proposal for doing this would be to add a new field to the fetch request, 
> max_bytes which would control the maximum data bytes we would include in the 
> response.
> The implementation on the server side would grab data from each partition in 
> the fetch request until it hit this limit, then send back just the data for 
> the partitions that fit in the response. The implementation would need to 
> start from a random position in the list of topics included in the fetch 
> request to ensure that in a case of backlog we fairly balance between 
> partitions (to avoid first giving just the first partition until that is 
> exhausted, then the next partition, etc).
> This setting will make the max.partition.fetch.bytes field in the fetch 
> request much less useful and we  should discuss just getting rid of it.
> I believe this also solves the same thing we were trying to address in 
> KAFKA-598. The max_bytes setting now becomes the new limit that would need to 
> be compared to max_message size. This can be much larger--e.g. setting a 50MB 
> max_bytes setting would be okay, whereas now if you set 50MB you may need to 
> allocate 50MB*num_partitions.
> This will require evolving the fetch request protocol version to add the new 
> field and we should do a KIP for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4006) Kafka connect fails sometime with InvalidTopicException in distributed mode

2016-07-29 Thread Sumit Arrawatia (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Arrawatia updated KAFKA-4006:
---
Description: 
I get trying to spin up a 3 node distributed connect cluster.

Sometimes one of the worker fails to boot with the following error when auto 
topic creation is enabled : 

org.apache.kafka.common.errors.InvalidTopicException: Topic 'default.config' is 
invalid 

default.config is the topic name for Connect config. 

Also, starting the worker again fixes the issue. 



  was:
I get trying to spin up a 3 node distributed connect cluster.

Sometimes one of the worker fails to boot with the following error when auto 
topic creation is enabled : 
`org.apache.kafka.common.errors.InvalidTopicException: Topic 'default.config' 
is invalid` error . 

`default.config` is the topic name for Connect config. 

Also, starting the worker again fixes the issue. 




> Kafka connect fails sometime with InvalidTopicException in distributed mode
> ---
>
> Key: KAFKA-4006
> URL: https://issues.apache.org/jira/browse/KAFKA-4006
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Sumit Arrawatia
>Assignee: Ewen Cheslack-Postava
>
> I get trying to spin up a 3 node distributed connect cluster.
> Sometimes one of the worker fails to boot with the following error when auto 
> topic creation is enabled : 
> org.apache.kafka.common.errors.InvalidTopicException: Topic 'default.config' 
> is invalid 
> default.config is the topic name for Connect config. 
> Also, starting the worker again fixes the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4006) Kafka connect fails sometime with InvalidTopicException in distributed mode

2016-07-29 Thread Sumit Arrawatia (JIRA)
Sumit Arrawatia created KAFKA-4006:
--

 Summary: Kafka connect fails sometime with InvalidTopicException 
in distributed mode
 Key: KAFKA-4006
 URL: https://issues.apache.org/jira/browse/KAFKA-4006
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 0.10.0.0
Reporter: Sumit Arrawatia
Assignee: Ewen Cheslack-Postava


I get trying to spin up a 3 node distributed connect cluster.

Sometimes one of the worker fails to boot with the following error when auto 
topic creation is enabled : 
`org.apache.kafka.common.errors.InvalidTopicException: Topic 'default.config' 
is invalid` error . 

`default.config` is the topic name for Connect config. 

Also, starting the worker again fixes the issue. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2063) Bound fetch response size

2016-07-29 Thread Ben Stopford (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399960#comment-15399960
 ] 

Ben Stopford commented on KAFKA-2063:
-

[~jkreps] OK, let's see what GZ says on 1. I think it should be ok though. 

on (2) that's a fair point. Happy to stick with randomisation. 

> Bound fetch response size
> -
>
> Key: KAFKA-2063
> URL: https://issues.apache.org/jira/browse/KAFKA-2063
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> Currently the only bound on the fetch response size is 
> max.partition.fetch.bytes * num_partitions. There are two problems:
> 1. First this bound is often large. You may chose 
> max.partition.fetch.bytes=1MB to enable messages of up to 1MB. However if you 
> also need to consume 1k partitions this means you may receive a 1GB response 
> in the worst case!
> 2. The actual memory usage is unpredictable. Partition assignment changes, 
> and you only actually get the full fetch amount when you are behind and there 
> is a full chunk of data ready. This means an application that seems to work 
> fine will suddenly OOM when partitions shift or when the application falls 
> behind.
> We need to decouple the fetch response size from the number of partitions.
> The proposal for doing this would be to add a new field to the fetch request, 
> max_bytes which would control the maximum data bytes we would include in the 
> response.
> The implementation on the server side would grab data from each partition in 
> the fetch request until it hit this limit, then send back just the data for 
> the partitions that fit in the response. The implementation would need to 
> start from a random position in the list of topics included in the fetch 
> request to ensure that in a case of backlog we fairly balance between 
> partitions (to avoid first giving just the first partition until that is 
> exhausted, then the next partition, etc).
> This setting will make the max.partition.fetch.bytes field in the fetch 
> request much less useful and we  should discuss just getting rid of it.
> I believe this also solves the same thing we were trying to address in 
> KAFKA-598. The max_bytes setting now becomes the new limit that would need to 
> be compared to max_message size. This can be much larger--e.g. setting a 50MB 
> max_bytes setting would be okay, whereas now if you set 50MB you may need to 
> allocate 50MB*num_partitions.
> This will require evolving the fetch request protocol version to add the new 
> field and we should do a KIP for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2016-07-29 Thread Konstantin Zadorozhny (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399928#comment-15399928
 ] 

Konstantin Zadorozhny commented on KAFKA-2729:
--

Seeing the same issue in our staging and production environments on 0.9.0.1. 
Bouncing brokers helps, but still not ideal.

Staging cluster were left to "recover" for a day. Didn't happen.

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2063) Bound fetch response size

2016-07-29 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399909#comment-15399909
 ] 

Jay Kreps commented on KAFKA-2063:
--

[~benstopford] 
1. I guess the question is how do you load balance over topics. So say your 
request level max bytes is 10MB, and let's say you have lot's of backlog of 
data on all partitions, does it make sense to get 10MB of a single partition or 
would you rather get a little from each (I'm actually not sure, I think what 
you're proposing may be fine)? There may be some implications for Streams since 
it tries to align partition consumption by time (cc [~guozhang]).
2. I agree that could potentially be better though we've generally preferred 
putting the smarts on the server rather than the clients.

> Bound fetch response size
> -
>
> Key: KAFKA-2063
> URL: https://issues.apache.org/jira/browse/KAFKA-2063
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> Currently the only bound on the fetch response size is 
> max.partition.fetch.bytes * num_partitions. There are two problems:
> 1. First this bound is often large. You may chose 
> max.partition.fetch.bytes=1MB to enable messages of up to 1MB. However if you 
> also need to consume 1k partitions this means you may receive a 1GB response 
> in the worst case!
> 2. The actual memory usage is unpredictable. Partition assignment changes, 
> and you only actually get the full fetch amount when you are behind and there 
> is a full chunk of data ready. This means an application that seems to work 
> fine will suddenly OOM when partitions shift or when the application falls 
> behind.
> We need to decouple the fetch response size from the number of partitions.
> The proposal for doing this would be to add a new field to the fetch request, 
> max_bytes which would control the maximum data bytes we would include in the 
> response.
> The implementation on the server side would grab data from each partition in 
> the fetch request until it hit this limit, then send back just the data for 
> the partitions that fit in the response. The implementation would need to 
> start from a random position in the list of topics included in the fetch 
> request to ensure that in a case of backlog we fairly balance between 
> partitions (to avoid first giving just the first partition until that is 
> exhausted, then the next partition, etc).
> This setting will make the max.partition.fetch.bytes field in the fetch 
> request much less useful and we  should discuss just getting rid of it.
> I believe this also solves the same thing we were trying to address in 
> KAFKA-598. The max_bytes setting now becomes the new limit that would need to 
> be compared to max_message size. This can be much larger--e.g. setting a 50MB 
> max_bytes setting would be okay, whereas now if you set 50MB you may need to 
> allocate 50MB*num_partitions.
> This will require evolving the fetch request protocol version to add the new 
> field and we should do a KIP for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-0.10.0-jdk7 #171

2016-07-29 Thread Apache Jenkins Server
See 

Changes:

[ismael] MINOR: Replace reference to HoppingWindows in streams.html

--
[...truncated 5724 lines...]
org.apache.kafka.streams.kstream.internals.KTableImplTest > testValueGetter 
PASSED

org.apache.kafka.streams.kstream.internals.KTableMapValuesTest > 
testSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableMapValuesTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableMapValuesTest > testKTable 
PASSED

org.apache.kafka.streams.kstream.internals.KTableMapValuesTest > 
testValueGetter PASSED

org.apache.kafka.streams.kstream.internals.KTableFilterTest > 
testSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableFilterTest > 
testNotSendingOldValue PASSED

org.apache.kafka.streams.kstream.internals.KTableFilterTest > 
testSkipNullOnMaterialization PASSED

org.apache.kafka.streams.kstream.internals.KTableFilterTest > testKTable PASSED

org.apache.kafka.streams.kstream.internals.KTableFilterTest > testValueGetter 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamTransformValuesTest > 
testTransform PASSED

org.apache.kafka.streams.kstream.internals.KStreamImplTest > 
testToWithNullValueSerdeDoesntNPE PASSED

org.apache.kafka.streams.kstream.internals.KStreamImplTest > testNumProcesses 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamMapTest > testMap PASSED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueWithProvidedSerde PASSED

org.apache.kafka.streams.kstream.internals.KeyValuePrinterProcessorTest > 
testPrintKeyValueDefaultSerde PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testOuterJoin PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamJoinTest > 
testWindowing PASSED

org.apache.kafka.streams.kstream.internals.KStreamKTableLeftJoinTest > 
testNotJoinable PASSED

org.apache.kafka.streams.kstream.internals.KStreamKTableLeftJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamMapValuesTest > 
testFlatMapValues PASSED

org.apache.kafka.streams.kstream.internals.KTableMapKeysTest > 
testMapKeysConvertingToStream PASSED

org.apache.kafka.streams.kstream.internals.KTableForeachTest > testForeach 
PASSED

org.apache.kafka.streams.kstream.internals.KGroupedTableImplTest > 
testGroupedCountOccurences PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamLeftJoinTest > 
testLeftJoin PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamLeftJoinTest > 
testWindowing PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testNotSendingOldValues PASSED

org.apache.kafka.streams.kstream.internals.KTableKTableJoinTest > 
testSendingOldValues PASSED

org.apache.kafka.streams.kstream.internals.KStreamForeachTest > testForeach 
PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > testAggBasic 
PASSED

org.apache.kafka.streams.kstream.internals.KTableAggregateTest > 
testAggRepartition PASSED

org.apache.kafka.streams.kstream.internals.KStreamFilterTest > testFilterNot 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamFilterTest > testFilter PASSED

org.apache.kafka.streams.kstream.internals.KStreamWindowAggregateTest > 
testAggBasic PASSED

org.apache.kafka.streams.kstream.internals.KStreamWindowAggregateTest > 
testJoin PASSED

org.apache.kafka.streams.kstream.internals.WindowedStreamPartitionerTest > 
testCopartitioning PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > defaultSerdeShouldBeConfigured 
PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetProducerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportMultipleBootstrapServers PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetRestoreConsumerConfigs 
PASSED

org.apache.kafka.streams.processor.DefaultPartitionGrouperTest > testGrouping 
PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > testSourceTopics PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
testAddSinkWithSameName PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
testAddSinkWithSelfParent PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
testAddProcessorWithSelfParent PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
testAddStateStoreWithSink PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > testTopicGroups PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > testBuild PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
testAddStateStoreWithSource PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 

Re: [DISCUSS] KAFKA-2063 Add possibility to bound fetch response size (was Re: [DISCUSS] Optimise memory used by replication process by using adaptive fetch message size)

2016-07-29 Thread Ben Stopford
Thanks for the kicking this one off Andrey. Generally it looks great! 

I left a comment on the Jira regarding whether we should remove the existing 
limitBytes, along with a potential alternative to doing randomisation. 

B
> On 29 Jul 2016, at 09:17, Andrey L. Neporada  wrote:
> 
> Hi all!
> 
> I would like to get your feedback on PR for bug KAFKA-2063.
> Looks like KIP is needed there, but it would be nice to get feedback first.
> 
> Thanks,
> Andrey.
> 
> 
>> On 22 Jul 2016, at 12:26, Andrey L. Neporada  
>> wrote:
>> 
>> Hi!
>> 
>> Thanks for feedback - I agree that the proper way to fix this issue is to 
>> provide per-request data limit.
>> Will try to do it.
>> 
>> Thanks,
>> Andrey.
>> 
>> 
>> 
>>> On 21 Jul 2016, at 18:57, Jay Kreps  wrote:
>>> 
>>> I think the memory usage for consumers can be improved a lot, but I think
>>> there may be a better way then what you are proposing.
>>> 
>>> The problem is exactly what you describe: the bound the user sets is
>>> per-partition, but the number of partitions may be quite high. The consumer
>>> could provide a bound on the response size by only requesting a subset of
>>> the partitions, but this would mean that if there was no data available on
>>> those partitions the consumer wouldn't be checking other partitions, which
>>> would add latency.
>>> 
>>> I think the solution is to add a new "max response size" parameter to the
>>> fetch request so the server checks all partitions but doesn't send back
>>> more than this amount in total. This has to be done carefully to ensure
>>> fairness (i.e. if one partition has unbounded amounts of data it shouldn't
>>> indefinitely starve other partitions).
>>> 
>>> This will fix memory management both in the replicas and for consumers.
>>> 
>>> There is a JIRA for this: https://issues.apache.org/jira/browse/KAFKA-2063
>>> 
>>> I think it isn't too hard to do and would be a huge aid to the memory
>>> profile of both the clients and server.
>>> 
>>> I also don't think there is much use in setting a max size that expands
>>> dynamically since in any case you have to be able to support the maximum,
>>> so you might as well always use that rather than expanding and contracting
>>> dynamically. That is, if your max fetch response size is 64MB you need to
>>> budget 64MB of free memory, so making it smaller some of the time doesn't
>>> really help you.
>>> 
>>> -Jay
>>> 
>>> On Thu, Jul 21, 2016 at 2:49 AM, Andrey L. Neporada <
>>> anepor...@yandex-team.ru> wrote:
>>> 
 Hi all!
 
 We noticed that our Kafka cluster uses a lot of memory for replication.
 Our Kafka usage pattern is following:
 
 1. Most messages are small (tens or hundreds kilobytes at most), but some
 (rare) messages can be several megabytes.So, we have to set
 replica.fetch.max.bytes = max.message.bytes = 8MB
 2. Each Kafka broker handles several thousands of partitions from multiple
 topics.
 
 In this scenario total memory required for replication (i.e.
 replica.fetch.max.bytes * numOfPartitions) is unreasonably big.
 
 So we would like to propose following approach to fix this problem:
 
 1. Introduce new config parameter replica.fetch.base.bytes - which is the
 initial size of replication data chunk. By default this parameter should be
 equal to replica.fetch.max.bytes so the replication process will work as
 before.
 
 2. If the ReplicaFetcherThread fails when trying to replicate message
 bigger than current replication chunk, we increase it twofold (or up to
 replica.fetch.max.bytes, whichever is smaller) and retry.
 
 3. If the chunk is replicated successfully we try to decrease the size of
 replication chunk back to replica.fetch.base.bytes.
 
 
 By choosing replica.fetch.base.bytes in optimal way (in our case ~200K),
 we we able to significatly decrease memory usage without any noticeable
 impact on replication efficiency.
 
 Here is JIRA ticket (with PR):
 https://issues.apache.org/jira/browse/KAFKA-3979
 
 Your comments and feedback are highly appreciated!
 
 
 Thanks,
 Andrey.
>> 
> 



[jira] [Commented] (KAFKA-2063) Bound fetch response size

2016-07-29 Thread Ben Stopford (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399855#comment-15399855
 ] 

Ben Stopford commented on KAFKA-2063:
-

Thanks for the kicking this one off Andrey. A couple of things come to mind on 
this:

1. I think it would make some sense if we also removed/deprecated the partition 
level maxBytes. Having both is a little confusing to the user. That way each 
request would just fill up to the request-level limitBytes using as many 
partitions as are needed to do so. 

2. I do wonder if randomisation is actually the best approach to this. A 
potentially better approach would be to adjust the order of the partitions 
passed by the consumer. So if we have a request for 8 partitions, we would pass 
partitions 0-7 to the server. The server might use partitions 0-3 to fill the 
response up to limitBytes. The consumer would then send the next fetch request 
with partitions ordered: 4,5,6,7,0,1,2,3. In this way we'd achieve reasonable 
fairness whilst also retaining some level of determinism. 



> Bound fetch response size
> -
>
> Key: KAFKA-2063
> URL: https://issues.apache.org/jira/browse/KAFKA-2063
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> Currently the only bound on the fetch response size is 
> max.partition.fetch.bytes * num_partitions. There are two problems:
> 1. First this bound is often large. You may chose 
> max.partition.fetch.bytes=1MB to enable messages of up to 1MB. However if you 
> also need to consume 1k partitions this means you may receive a 1GB response 
> in the worst case!
> 2. The actual memory usage is unpredictable. Partition assignment changes, 
> and you only actually get the full fetch amount when you are behind and there 
> is a full chunk of data ready. This means an application that seems to work 
> fine will suddenly OOM when partitions shift or when the application falls 
> behind.
> We need to decouple the fetch response size from the number of partitions.
> The proposal for doing this would be to add a new field to the fetch request, 
> max_bytes which would control the maximum data bytes we would include in the 
> response.
> The implementation on the server side would grab data from each partition in 
> the fetch request until it hit this limit, then send back just the data for 
> the partitions that fit in the response. The implementation would need to 
> start from a random position in the list of topics included in the fetch 
> request to ensure that in a case of backlog we fairly balance between 
> partitions (to avoid first giving just the first partition until that is 
> exhausted, then the next partition, etc).
> This setting will make the max.partition.fetch.bytes field in the fetch 
> request much less useful and we  should discuss just getting rid of it.
> I believe this also solves the same thing we were trying to address in 
> KAFKA-598. The max_bytes setting now becomes the new limit that would need to 
> be compared to max_message size. This can be much larger--e.g. setting a 50MB 
> max_bytes setting would be okay, whereas now if you set 50MB you may need to 
> allocate 50MB*num_partitions.
> This will require evolving the fetch request protocol version to add the new 
> field and we should do a KIP for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3766) Unhandled "not enough replicas" errors in SyncGroup

2016-07-29 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-3766.

Resolution: Duplicate

> Unhandled "not enough replicas" errors in SyncGroup
> ---
>
> Key: KAFKA-3766
> URL: https://issues.apache.org/jira/browse/KAFKA-3766
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> Caught by [~ijuma]. We seem to be missing at least a couple error codes when 
> handling the append log response when writing group metadata to the offsets 
> topic in the SyncGroup handler. In particular, we are missing checks for 
> NOT_ENOUGH_REPLICAS and NOT_ENOUGH_REPLICAS_AFTER_APPEND. Currently these 
> errors are returned directly in the SyncGroup response and cause an exception 
> to be raised to the user.
> There are two options to fix this problem:
> 1. We can continue to return these error codes in the sync group response and 
> add a handler on the client side to retry.
> 2. We can convert the errors on the server to something like 
> COORDINATOR_NOT_AVAILABLE, which will cause the client to retry with the 
> existing logic.
> The second option seems a little nicer to avoid exposing the internal 
> implementation of the SyncGroup request (i.e. that we write group metadata to 
> a partition). It also has the nice side effect of fixing old clients 
> automatically when the server is upgraded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[DISCUSS] KAFKA-2063 Add possibility to bound fetch response size (was Re: [DISCUSS] Optimise memory used by replication process by using adaptive fetch message size)

2016-07-29 Thread Andrey L . Neporada
Hi all!

I would like to get your feedback on PR for bug KAFKA-2063.
Looks like KIP is needed there, but it would be nice to get feedback first.

Thanks,
Andrey.


> On 22 Jul 2016, at 12:26, Andrey L. Neporada  wrote:
> 
> Hi!
> 
> Thanks for feedback - I agree that the proper way to fix this issue is to 
> provide per-request data limit.
> Will try to do it.
> 
> Thanks,
> Andrey.
> 
> 
> 
>> On 21 Jul 2016, at 18:57, Jay Kreps  wrote:
>> 
>> I think the memory usage for consumers can be improved a lot, but I think
>> there may be a better way then what you are proposing.
>> 
>> The problem is exactly what you describe: the bound the user sets is
>> per-partition, but the number of partitions may be quite high. The consumer
>> could provide a bound on the response size by only requesting a subset of
>> the partitions, but this would mean that if there was no data available on
>> those partitions the consumer wouldn't be checking other partitions, which
>> would add latency.
>> 
>> I think the solution is to add a new "max response size" parameter to the
>> fetch request so the server checks all partitions but doesn't send back
>> more than this amount in total. This has to be done carefully to ensure
>> fairness (i.e. if one partition has unbounded amounts of data it shouldn't
>> indefinitely starve other partitions).
>> 
>> This will fix memory management both in the replicas and for consumers.
>> 
>> There is a JIRA for this: https://issues.apache.org/jira/browse/KAFKA-2063
>> 
>> I think it isn't too hard to do and would be a huge aid to the memory
>> profile of both the clients and server.
>> 
>> I also don't think there is much use in setting a max size that expands
>> dynamically since in any case you have to be able to support the maximum,
>> so you might as well always use that rather than expanding and contracting
>> dynamically. That is, if your max fetch response size is 64MB you need to
>> budget 64MB of free memory, so making it smaller some of the time doesn't
>> really help you.
>> 
>> -Jay
>> 
>> On Thu, Jul 21, 2016 at 2:49 AM, Andrey L. Neporada <
>> anepor...@yandex-team.ru> wrote:
>> 
>>> Hi all!
>>> 
>>> We noticed that our Kafka cluster uses a lot of memory for replication.
>>> Our Kafka usage pattern is following:
>>> 
>>> 1. Most messages are small (tens or hundreds kilobytes at most), but some
>>> (rare) messages can be several megabytes.So, we have to set
>>> replica.fetch.max.bytes = max.message.bytes = 8MB
>>> 2. Each Kafka broker handles several thousands of partitions from multiple
>>> topics.
>>> 
>>> In this scenario total memory required for replication (i.e.
>>> replica.fetch.max.bytes * numOfPartitions) is unreasonably big.
>>> 
>>> So we would like to propose following approach to fix this problem:
>>> 
>>> 1. Introduce new config parameter replica.fetch.base.bytes - which is the
>>> initial size of replication data chunk. By default this parameter should be
>>> equal to replica.fetch.max.bytes so the replication process will work as
>>> before.
>>> 
>>> 2. If the ReplicaFetcherThread fails when trying to replicate message
>>> bigger than current replication chunk, we increase it twofold (or up to
>>> replica.fetch.max.bytes, whichever is smaller) and retry.
>>> 
>>> 3. If the chunk is replicated successfully we try to decrease the size of
>>> replication chunk back to replica.fetch.base.bytes.
>>> 
>>> 
>>> By choosing replica.fetch.base.bytes in optimal way (in our case ~200K),
>>> we we able to significatly decrease memory usage without any noticeable
>>> impact on replication efficiency.
>>> 
>>> Here is JIRA ticket (with PR):
>>> https://issues.apache.org/jira/browse/KAFKA-3979
>>> 
>>> Your comments and feedback are highly appreciated!
>>> 
>>> 
>>> Thanks,
>>> Andrey.
> 



[jira] [Commented] (KAFKA-2063) Bound fetch response size

2016-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399590#comment-15399590
 ] 

ASF GitHub Bot commented on KAFKA-2063:
---

GitHub user nepal opened a pull request:

https://github.com/apache/kafka/pull/1683

KAFKA-2063: Add possibility to bound fetch response size

Details are:

1. Protocol is extended with new version of FetchRequest with extra 
parameter limit_bytes.
2. New config setting fetch.limit.bytes is added. It is set to zero by 
default (which means "no limit"), so we preserve old behaviour by default
3. When broker receives FetchRequest with limit_bytes != 0, it performs 
random shuffle of partitions before reading is started. This way we can ensure 
that no starvation can happen for "slow" partitions
4. For each partition we read up to max_bytes requested for this partition, 
even if current value of limit_bytes is greater than zero but less than 
max_bytes. This way we can properly handle case when next message in given 
partition is larger than current value of limit_bytes.

 
 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nepal/kafka fetch-request-add-limit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1683.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1683


commit 4f703539bcd46058d036bafd300b98a93dec26a5
Author: Andrey L. Neporada 
Date:   2016-07-29T12:28:07Z

KAFKA-2063: Add possibility to bound fetch response size




> Bound fetch response size
> -
>
> Key: KAFKA-2063
> URL: https://issues.apache.org/jira/browse/KAFKA-2063
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> Currently the only bound on the fetch response size is 
> max.partition.fetch.bytes * num_partitions. There are two problems:
> 1. First this bound is often large. You may chose 
> max.partition.fetch.bytes=1MB to enable messages of up to 1MB. However if you 
> also need to consume 1k partitions this means you may receive a 1GB response 
> in the worst case!
> 2. The actual memory usage is unpredictable. Partition assignment changes, 
> and you only actually get the full fetch amount when you are behind and there 
> is a full chunk of data ready. This means an application that seems to work 
> fine will suddenly OOM when partitions shift or when the application falls 
> behind.
> We need to decouple the fetch response size from the number of partitions.
> The proposal for doing this would be to add a new field to the fetch request, 
> max_bytes which would control the maximum data bytes we would include in the 
> response.
> The implementation on the server side would grab data from each partition in 
> the fetch request until it hit this limit, then send back just the data for 
> the partitions that fit in the response. The implementation would need to 
> start from a random position in the list of topics included in the fetch 
> request to ensure that in a case of backlog we fairly balance between 
> partitions (to avoid first giving just the first partition until that is 
> exhausted, then the next partition, etc).
> This setting will make the max.partition.fetch.bytes field in the fetch 
> request much less useful and we  should discuss just getting rid of it.
> I believe this also solves the same thing we were trying to address in 
> KAFKA-598. The max_bytes setting now becomes the new limit that would need to 
> be compared to max_message size. This can be much larger--e.g. setting a 50MB 
> max_bytes setting would be okay, whereas now if you set 50MB you may need to 
> allocate 50MB*num_partitions.
> This will require evolving the fetch request protocol version to add the new 
> field and we should do a KIP for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1683: KAFKA-2063: Add possibility to bound fetch respons...

2016-07-29 Thread nepal
GitHub user nepal opened a pull request:

https://github.com/apache/kafka/pull/1683

KAFKA-2063: Add possibility to bound fetch response size

Details are:

1. Protocol is extended with new version of FetchRequest with extra 
parameter limit_bytes.
2. New config setting fetch.limit.bytes is added. It is set to zero by 
default (which means "no limit"), so we preserve old behaviour by default
3. When broker receives FetchRequest with limit_bytes != 0, it performs 
random shuffle of partitions before reading is started. This way we can ensure 
that no starvation can happen for "slow" partitions
4. For each partition we read up to max_bytes requested for this partition, 
even if current value of limit_bytes is greater than zero but less than 
max_bytes. This way we can properly handle case when next message in given 
partition is larger than current value of limit_bytes.

 
 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nepal/kafka fetch-request-add-limit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1683.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1683


commit 4f703539bcd46058d036bafd300b98a93dec26a5
Author: Andrey L. Neporada 
Date:   2016-07-29T12:28:07Z

KAFKA-2063: Add possibility to bound fetch response size




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError

2016-07-29 Thread Brice Dutheil (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399572#comment-15399572
 ] 

Brice Dutheil edited comment on KAFKA-3990 at 7/29/16 4:04 PM:
---

Hi after further investigation we found out the issue came because we switched 
from bamboo to marathon-lb, and marathon-lb opens the 9091 HTTP port 
(https://github.com/mesosphere/marathon-lb#operational-best-practices), we 
missed that during the upgrade.

Doing a curl on this port supposed to be configured for a kafka broker we got 
an HTTP response : 

{code}
> curl -v dockerhost:9091
* About to connect() to dockerhost port 9091 (#0)
*   Trying 172.17.42.1...
* Connected to dockerhost (172.17.42.1) port 9091 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: dockerhost:9091
> Accept: */*
>
* Empty reply from server
* Connection #0 to host dockerhost left intact
curl: (52) Empty reply from server
{code}

*However I'm surprised Kafka / clients don't check the validity of the payload, 
at least upon establishment of the connection.*


was (Author: bric3):
Hi after further investigation we found out the issue came because we switched 
from bamboo to marathon-lb, and marathon-lb opens the 9091 HTTP port 
(https://github.com/mesosphere/marathon-lb#operational-best-practices), we 
missed that during the upgrade.

Doing a curl on this port supposed to be configured for a kafka broker we got 
an HTTP response : 

{code}
> curl -v dockerhost:9091
* About to connect() to dockerhost port 9091 (#0)
*   Trying 172.17.42.1...
* Connected to dockerhost (172.17.42.1) port 9091 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: dockerhost:9091
> Accept: */*
>
* Empty reply from server
* Connection #0 to host dockerhost left intact
curl: (52) Empty reply from server
{code}

However I'm surprised Kafka / clients don't check the validity of the payload.

> Kafka New Producer may raise an OutOfMemoryError
> 
>
> Key: KAFKA-3990
> URL: https://issues.apache.org/jira/browse/KAFKA-3990
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
> Environment: Docker, Base image : CentOS
> Java 8u77
>Reporter: Brice Dutheil
> Attachments: app-producer-config.log, kafka-broker-logs.zip
>
>
> We are regularly seeing OOME errors on a kafka producer, we first saw :
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} 
> (see 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93)
> Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And 
> we are producing small messages 500B at most.
> Also the error don't appear on the devlopment environment, in order to 
> identify the issue we tweaked the code to give us actual data of the 
> allocation size, we got this stack :
> {code}
> 09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: constructor : Integer='-1', String='-1'
> 09:55:49.485 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: method : 
> NetworkReceive.readFromReadableChannel.receiveSize=1213486160
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to /tmp/tomcat.hprof ...
> Heap dump file created [69583827 bytes in 0.365 secs]
> 09:55:50.324 [auth] [kafka-producer-network-thread | producer-1] ERROR 
> o.a.k.c.utils.KafkaThread Uncaught exception in kafka-producer-network-thread 
> | producer-1: 
> java.lang.

[jira] [Updated] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError

2016-07-29 Thread Brice Dutheil (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brice Dutheil updated KAFKA-3990:
-
Environment: 
Docker, Base image : CentOS
Java 8u77
Marathon

  was:
Docker, Base image : CentOS
Java 8u77


> Kafka New Producer may raise an OutOfMemoryError
> 
>
> Key: KAFKA-3990
> URL: https://issues.apache.org/jira/browse/KAFKA-3990
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
> Environment: Docker, Base image : CentOS
> Java 8u77
> Marathon
>Reporter: Brice Dutheil
> Attachments: app-producer-config.log, kafka-broker-logs.zip
>
>
> We are regularly seeing OOME errors on a kafka producer, we first saw :
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} 
> (see 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93)
> Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And 
> we are producing small messages 500B at most.
> Also the error don't appear on the devlopment environment, in order to 
> identify the issue we tweaked the code to give us actual data of the 
> allocation size, we got this stack :
> {code}
> 09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: constructor : Integer='-1', String='-1'
> 09:55:49.485 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: method : 
> NetworkReceive.readFromReadableChannel.receiveSize=1213486160
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to /tmp/tomcat.hprof ...
> Heap dump file created [69583827 bytes in 0.365 secs]
> 09:55:50.324 [auth] [kafka-producer-network-thread | producer-1] ERROR 
> o.a.k.c.utils.KafkaThread Uncaught exception in kafka-producer-network-thread 
> | producer-1: 
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
>   at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> Notice the size to allocate {{1213486160}} ~1.2 GB. I'm not yet sure how this 
> size is initialised.
> Notice as well that every time this OOME appear the {{NetworkReceive}} 
> constructor at 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L49
>  receive the parameters : {{maxSize=-1}}, {{source="-1"}}
> We may have missed configuration in our setup but kafka clients shouldn't 
> raise a

[jira] [Comment Edited] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError

2016-07-29 Thread Brice Dutheil (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399572#comment-15399572
 ] 

Brice Dutheil edited comment on KAFKA-3990 at 7/29/16 4:03 PM:
---

Hi after further investigation we found out the issue came because we switched 
from bamboo to marathon-lb, and marathon-lb opens the 9091 HTTP port 
(https://github.com/mesosphere/marathon-lb#operational-best-practices), we 
missed that during the upgrade.

Doing a curl on this port supposed to be configured for a kafka broker we got 
an HTTP response : 

{code}
> curl -v dockerhost:9091
* About to connect() to dockerhost port 9091 (#0)
*   Trying 172.17.42.1...
* Connected to dockerhost (172.17.42.1) port 9091 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: dockerhost:9091
> Accept: */*
>
* Empty reply from server
* Connection #0 to host dockerhost left intact
curl: (52) Empty reply from server
{code}

However I'm surprised Kafka / clients don't check the validity of the payload.


was (Author: bric3):
Hi after further investigation we found out the issue came because we switched 
from bamboo to marathon-lb, and marathon-lb opens the 9091 HTTP port 
(https://github.com/mesosphere/marathon-lb#operational-best-practices), we 
missed that during the upgrade.

{code}
> curl -v dockerhost:9091
* About to connect() to dockerhost port 9091 (#0)
*   Trying 172.17.42.1...
* Connected to dockerhost (172.17.42.1) port 9091 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: dockerhost:9091
> Accept: */*
>
* Empty reply from server
* Connection #0 to host dockerhost left intact
curl: (52) Empty reply from server
{code}

However I'm surprised Kafka / clients don't check the validity of the payload.

> Kafka New Producer may raise an OutOfMemoryError
> 
>
> Key: KAFKA-3990
> URL: https://issues.apache.org/jira/browse/KAFKA-3990
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
> Environment: Docker, Base image : CentOS
> Java 8u77
>Reporter: Brice Dutheil
> Attachments: app-producer-config.log, kafka-broker-logs.zip
>
>
> We are regularly seeing OOME errors on a kafka producer, we first saw :
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} 
> (see 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93)
> Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And 
> we are producing small messages 500B at most.
> Also the error don't appear on the devlopment environment, in order to 
> identify the issue we tweaked the code to give us actual data of the 
> allocation size, we got this stack :
> {code}
> 09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: constructor : Integer='-1', String='-1'
> 09:55:49.485 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: method : 
> NetworkReceive.readFromReadableChannel.receiveSize=1213486160
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to /tmp/tomcat.hprof ...
> Heap dump file created [69583827 bytes in 0.365 secs]
> 09:55:50.324 [auth] [kafka-producer-network-thread | producer-1] ERROR 
> o.a.k.c.utils.KafkaThread Uncaught exception in kafka-producer-network-thread 
> | producer-1: 
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
>   at java.nio.ByteBuffer.allocate(ByteBuffe

[jira] [Commented] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError

2016-07-29 Thread Brice Dutheil (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399572#comment-15399572
 ] 

Brice Dutheil commented on KAFKA-3990:
--

Hi after further investigation we found out the issue came because we switched 
from bamboo to marathon-lb, and marathon-lb opens the 9091 HTTP port 
(https://github.com/mesosphere/marathon-lb#operational-best-practices), we 
missed that during the upgrade.

{code}
> curl -v dockerhost:9091
* About to connect() to dockerhost port 9091 (#0)
*   Trying 172.17.42.1...
* Connected to dockerhost (172.17.42.1) port 9091 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: dockerhost:9091
> Accept: */*
>
* Empty reply from server
* Connection #0 to host dockerhost left intact
curl: (52) Empty reply from server
{code}

However I'm surprised Kafka / clients don't check the validity of the payload.

> Kafka New Producer may raise an OutOfMemoryError
> 
>
> Key: KAFKA-3990
> URL: https://issues.apache.org/jira/browse/KAFKA-3990
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
> Environment: Docker, Base image : CentOS
> Java 8u77
>Reporter: Brice Dutheil
> Attachments: app-producer-config.log, kafka-broker-logs.zip
>
>
> We are regularly seeing OOME errors on a kafka producer, we first saw :
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} 
> (see 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93)
> Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And 
> we are producing small messages 500B at most.
> Also the error don't appear on the devlopment environment, in order to 
> identify the issue we tweaked the code to give us actual data of the 
> allocation size, we got this stack :
> {code}
> 09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: constructor : Integer='-1', String='-1'
> 09:55:49.485 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: method : 
> NetworkReceive.readFromReadableChannel.receiveSize=1213486160
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to /tmp/tomcat.hprof ...
> Heap dump file created [69583827 bytes in 0.365 secs]
> 09:55:50.324 [auth] [kafka-producer-network-thread | producer-1] ERROR 
> o.a.k.c.utils.KafkaThread Uncaught exception in kafka-producer-network-thread 
> | producer-1: 
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
>   at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apa

Re: [kafka-clients] [VOTE] 0.10.0.1 RC0

2016-07-29 Thread Dana Powers
+1

tested against kafka-python integration test suite = pass.

Aside: as the scope of kafka gets bigger, it may be useful to organize
release notes into functional groups like core, brokers, clients,
kafka-streams, etc. I've found this useful when organizing
kafka-python release notes.

-Dana

On Fri, Jul 29, 2016 at 7:46 AM, Ismael Juma  wrote:
> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for the release of Apache Kafka 0.10.0.1. This
> is a bug fix release and it includes fixes and improvements from 50 JIRAs
> (including a few critical bugs). See the release notes for more details:
>
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Monday, 1 August, 8am PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging
>
> * Javadoc:
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=0c2322c2cf7ab7909cfd8b834d1d2fffc34db109
>
> * Documentation:
> http://kafka.apache.org/0100/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0100/protocol.html
>
> * Successful Jenkins builds for the 0.10.0 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-0.10.0-jdk7/170/
> System tests: https://jenkins.confluent.io/job/system-test-kafka-0.10.0/130/
>
> Thanks,
> Ismael
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAD5tkZYz8fbLAodpqKg5eRiCsm4ze9QK3ufTz3Q4U%3DGs0CRb1A%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.


[GitHub] kafka pull request #1682: HOTFIX: non-unique state.dirs in integration tests...

2016-07-29 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/1682

HOTFIX: non-unique state.dirs in integration tests causing build to hang

Three Streams Integration tests were using the same directory for the 
state.dir config. This was causing the build to hang when run in parallel mode

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka fix-state-dir

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1682.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1682


commit 4c71d2a0fbf254448c196c78124f9449aa3d3f8b
Author: Damian Guy 
Date:   2016-07-29T15:46:21Z

unique state.dirs in integration tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3766) Unhandled "not enough replicas" errors in SyncGroup

2016-07-29 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399520#comment-15399520
 ] 

Dustin Cote commented on KAFKA-3766:


Marking this as a dupe of KAFKA-3590.  I've tried to incorporate suggestion #2 
in my PR over there.

> Unhandled "not enough replicas" errors in SyncGroup
> ---
>
> Key: KAFKA-3766
> URL: https://issues.apache.org/jira/browse/KAFKA-3766
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> Caught by [~ijuma]. We seem to be missing at least a couple error codes when 
> handling the append log response when writing group metadata to the offsets 
> topic in the SyncGroup handler. In particular, we are missing checks for 
> NOT_ENOUGH_REPLICAS and NOT_ENOUGH_REPLICAS_AFTER_APPEND. Currently these 
> errors are returned directly in the SyncGroup response and cause an exception 
> to be raised to the user.
> There are two options to fix this problem:
> 1. We can continue to return these error codes in the sync group response and 
> add a handler on the client side to retry.
> 2. We can convert the errors on the server to something like 
> COORDINATOR_NOT_AVAILABLE, which will cause the client to retry with the 
> existing logic.
> The second option seems a little nicer to avoid exposing the internal 
> implementation of the SyncGroup request (i.e. that we write group metadata to 
> a partition). It also has the nice side effect of fixing old clients 
> automatically when the server is upgraded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-07-29 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399518#comment-15399518
 ] 

Dustin Cote commented on KAFKA-3590:


Thanks [~ijuma], I didn't see that one.  Updated the PR to move toward Jason's 
recommendation.

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Dustin Cote
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-07-29 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399465#comment-15399465
 ] 

Ismael Juma commented on KAFKA-3590:


This looks similar to KAFKA-3766

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Dustin Cote
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-07-29 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399465#comment-15399465
 ] 

Ismael Juma edited comment on KAFKA-3590 at 7/29/16 3:08 PM:
-

This looks similar to KAFKA-3766, Jason suggests a different solution there.


was (Author: ijuma):
This looks similar to KAFKA-3766

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Dustin Cote
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-07-29 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated KAFKA-3590:
---
Status: Patch Available  (was: Open)

> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Dustin Cote
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3590) KafkaConsumer fails with "Messages are rejected since there are fewer in-sync replicas than required." when polling

2016-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399460#comment-15399460
 ] 

ASF GitHub Bot commented on KAFKA-3590:
---

GitHub user cotedm opened a pull request:

https://github.com/apache/kafka/pull/1681

KAFKA-3590: KafkaConsumer fails with "Messages are rejected since there are 
fewer in-sync replicas than required." when polling

This just improves the error message since it's confusing that a consumer 
throws a message you'd normally see for a producer. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cotedm/kafka KAFKA-3590

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1681.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1681


commit 9873e9fa94c8781dccb390cde890c48f9aa5ffc5
Author: Dustin Cote 
Date:   2016-07-29T14:59:10Z

Handle error case for under replicated __consumer_offsets by logging a 
better message

commit 45c0d6cd9e4192fae6f44a308812679f8fdb3fd3
Author: Dustin Cote 
Date:   2016-07-29T15:02:28Z

updated comment




> KafkaConsumer fails with "Messages are rejected since there are fewer in-sync 
> replicas than required." when polling
> ---
>
> Key: KAFKA-3590
> URL: https://issues.apache.org/jira/browse/KAFKA-3590
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: JDK1.8 Ubuntu 14.04
>Reporter: Sergey Alaev
>Assignee: Dustin Cote
>
> KafkaConsumer.poll() fails with "Messages are rejected since there are fewer 
> in-sync replicas than required.". Isn't this message about minimum number of 
> ISR's when *sending* messages?
> Stacktrace:
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:444)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupRequestHandler.handle(AbstractCoordinator.java:411)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:222)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:311)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:890)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> ~[kafka-clients-0.9.0.1.jar:na]



--
This message was sent by Atlassian JIRA

[GitHub] kafka pull request #1681: KAFKA-3590: KafkaConsumer fails with "Messages are...

2016-07-29 Thread cotedm
GitHub user cotedm opened a pull request:

https://github.com/apache/kafka/pull/1681

KAFKA-3590: KafkaConsumer fails with "Messages are rejected since there are 
fewer in-sync replicas than required." when polling

This just improves the error message since it's confusing that a consumer 
throws a message you'd normally see for a producer. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cotedm/kafka KAFKA-3590

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1681.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1681


commit 9873e9fa94c8781dccb390cde890c48f9aa5ffc5
Author: Dustin Cote 
Date:   2016-07-29T14:59:10Z

Handle error case for under replicated __consumer_offsets by logging a 
better message

commit 45c0d6cd9e4192fae6f44a308812679f8fdb3fd3
Author: Dustin Cote 
Date:   2016-07-29T15:02:28Z

updated comment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] 0.10.0.1 RC0

2016-07-29 Thread Harsha Chintalapani
Hi Ismael,
 I would like this JIRA to be included in the minor release
https://issues.apache.org/jira/browse/KAFKA-3950
Thanks,
Harsha
On Fri, Jul 29, 2016 at 7:46 AM Ismael Juma  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for the release of Apache Kafka 0.10.0.1. This
> is a bug fix release and it includes fixes and improvements from 50 JIRAs
> (including a few critical bugs). See the release notes for more details:
>
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Monday, 1 August, 8am PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging
>
> * Javadoc:
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=0c2322c2cf7ab7909cfd8b834d1d2fffc34db109
>
> * Documentation:
> http://kafka.apache.org/0100/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0100/protocol.html
>
> * Successful Jenkins builds for the 0.10.0 branch:
> Unit/integration tests:
> https://builds.apache.org/job/kafka-0.10.0-jdk7/170/
> System tests:
> https://jenkins.confluent.io/job/system-test-kafka-0.10.0/130/
>
> Thanks,
> Ismael
>


[jira] [Commented] (KAFKA-3950) kafka mirror maker tool is not respecting whitelist option

2016-07-29 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399444#comment-15399444
 ] 

Manikumar Reddy commented on KAFKA-3950:


[~ijuma] Can we include this in 0.10.0.1 release?

> kafka mirror maker tool is not respecting whitelist option
> --
>
> Key: KAFKA-3950
> URL: https://issues.apache.org/jira/browse/KAFKA-3950
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raghav Kumar Gautam
>Assignee: Manikumar Reddy
>Priority: Critical
>
> A mirror maker launched like this:
> {code}
> /usr/bin/kinit -k -t /home/kfktest/hadoopqa/keytabs/kfktest.headless.keytab 
> kfkt...@example.com
> JAVA_HOME=/usr/jdk64/jdk1.8.0_77 JMX_PORT=9112 
> /usr/kafka/bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config 
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/config/mirror_consumer_12.properties
>  --producer.config 
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/config/mirror_producer_12.properties
>  --new.consumer --whitelist="test.*" >>  
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/logs/mirror_maker-12/mirror_maker_12.log
>  2>&1 & echo pid:$! >  
> /usr/kafka/system_test/mirror_maker_testsuite/testcase_15001/logs/mirror_maker-12/entity_12_pid
> {code}
> Lead to TopicAuthorizationException:
> {code}
> WARN Error while fetching metadata with correlation id 44 : 
> {__consumer_offsets=TOPIC_AUTHORIZATION_FAILED} 
> (org.apache.kafka.clients.NetworkClient)
> [2016-06-20 13:24:49,983] FATAL [mirrormaker-thread-0] Mirror maker thread 
> failure due to  (kafka.tools.MirrorMaker$MirrorMakerThread)
> org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to 
> access topics: [__consumer_offsets]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] 0.10.0.1 RC0

2016-07-29 Thread Ismael Juma
Hello Kafka users, developers and client-developers,

This is the first candidate for the release of Apache Kafka 0.10.0.1. This
is a bug fix release and it includes fixes and improvements from 50 JIRAs
(including a few critical bugs). See the release notes for more details:

http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Monday, 1 August, 8am PT ***

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging

* Javadoc:
http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/javadoc/

* Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=0c2322c2cf7ab7909cfd8b834d1d2fffc34db109

* Documentation:
http://kafka.apache.org/0100/documentation.html

* Protocol:
http://kafka.apache.org/0100/protocol.html

* Successful Jenkins builds for the 0.10.0 branch:
Unit/integration tests: https://builds.apache.org/job/kafka-0.10.0-jdk7/170/
System tests: https://jenkins.confluent.io/job/system-test-kafka-0.10.0/130/

Thanks,
Ismael


Re: Jars in Kafka 0.10

2016-07-29 Thread Gerard Klijs
No, if you don't use streams you don't need them. If you have no clients
(so also no mirror maker) running on the same machine you also don't need
the client jar, if you run zookeeper separately you also don't need those.

On Fri, Jul 29, 2016 at 4:22 PM Bhuvaneswaran Gopalasami <
bhuvanragha...@gmail.com> wrote:

> I have recently started looking into Kafka I noticed the number of Jars in
> Kafka 0.10 has increased when compared to 0.8.2. Do we really need all
> those libraries to run Kafka ?
>
> Thanks,
> Bhuvanes
>


[jira] [Commented] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError

2016-07-29 Thread Brice Dutheil (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399370#comment-15399370
 ] 

Brice Dutheil commented on KAFKA-3990:
--

Hi all, sorry for the delayed response I have busy with other stuff.

Yes the broker is 0.9.0.1 as well. It runs in a docker container too.
I attached the broker logs. We restarted the single instance cluster (~ 13:20), 
and a few minutes later (~ 13:34) we ran the application and the app face same 
problem with this big message.

This got me curious, I only looked at the server.log, however controller.log 
show OOME as well right at the broker instance start :

{code}
[2016-07-29 13:20:34,366] WARN [Controller-1-to-broker-1-send-thread], 
Controller 1 epoch 1 fails to send request 
{controller_id=1,controller_epoch=1,partition_states=[],live_brokers=[{id=1,end_points=[{port=9091,host=dockerhost,security_protocol_type=0}]}]}
 to broker Node(1, dockerhost, 9091). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
at org.apache.kafka.common.network.Selector.poll(Selector.java:286)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)
at 
kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:128)
at 
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
at 
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
at 
kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
at 
kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
{code}








> Kafka New Producer may raise an OutOfMemoryError
> 
>
> Key: KAFKA-3990
> URL: https://issues.apache.org/jira/browse/KAFKA-3990
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
> Environment: Docker, Base image : CentOS
> Java 8u77
>Reporter: Brice Dutheil
> Attachments: app-producer-config.log, kafka-broker-logs.zip
>
>
> We are regularly seeing OOME errors on a kafka producer, we first saw :
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} 
> (see 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93)
> Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And 
> we are producing small messages 500B at most.
> Also the error don't appear on the devlopment environment, in order to 
> identify the issue we tweaked the code to give us actual data of the 
> allocation size, we got this stack :
> {code}
> 09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: constructor : Integer='-1', String='-1'
> 09:55:49.485 [auth] [kafka-producer-network-thread | producer-1] WARN

Jars in Kafka 0.10

2016-07-29 Thread Bhuvaneswaran Gopalasami
I have recently started looking into Kafka I noticed the number of Jars in
Kafka 0.10 has increased when compared to 0.8.2. Do we really need all
those libraries to run Kafka ?

Thanks,
Bhuvanes


[jira] [Updated] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError

2016-07-29 Thread Brice Dutheil (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brice Dutheil updated KAFKA-3990:
-
Attachment: app-producer-config.log
kafka-broker-logs.zip

> Kafka New Producer may raise an OutOfMemoryError
> 
>
> Key: KAFKA-3990
> URL: https://issues.apache.org/jira/browse/KAFKA-3990
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
> Environment: Docker, Base image : CentOS
> Java 8u77
>Reporter: Brice Dutheil
> Attachments: app-producer-config.log, kafka-broker-logs.zip
>
>
> We are regularly seeing OOME errors on a kafka producer, we first saw :
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} 
> (see 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93)
> Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And 
> we are producing small messages 500B at most.
> Also the error don't appear on the devlopment environment, in order to 
> identify the issue we tweaked the code to give us actual data of the 
> allocation size, we got this stack :
> {code}
> 09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: constructor : Integer='-1', String='-1'
> 09:55:49.485 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkReceive HEAP-ISSUE: method : 
> NetworkReceive.readFromReadableChannel.receiveSize=1213486160
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to /tmp/tomcat.hprof ...
> Heap dump file created [69583827 bytes in 0.365 secs]
> 09:55:50.324 [auth] [kafka-producer-network-thread | producer-1] ERROR 
> o.a.k.c.utils.KafkaThread Uncaught exception in kafka-producer-network-thread 
> | producer-1: 
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
>   at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
>   at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> Notice the size to allocate {{1213486160}} ~1.2 GB. I'm not yet sure how this 
> size is initialised.
> Notice as well that every time this OOME appear the {{NetworkReceive}} 
> constructor at 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L49
>  receive the parameters : {{maxSize=-1}}, {{source="-1"}}
> We may have missed configuration in our setup but kafka clients shouldn't 
> raise an OOME. For reference the producer is initial

[GitHub] kafka pull request #1680: KAFKA-3946: Protocol guide should say that Produce...

2016-07-29 Thread mimaison
GitHub user mimaison opened a pull request:

https://github.com/apache/kafka/pull/1680

KAFKA-3946: Protocol guide should say that Produce request acks can o…

…nly be 0, 1, or -1

Rephrased the documentation string for the Produce request
Updated the acks configuration docs to state that -1, 0, and 1 are the only 
allowed values

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mimaison/kafka KAFKA-3946

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1680


commit afa886cf387b3b184f127c2b9b62e480e25d6730
Author: Mickael Maison 
Date:   2016-07-29T10:13:34Z

KAFKA-3946: Protocol guide should say that Produce request acks can only be 
0, 1, or -1

Rephrased the documentation string for the Produce request
Updated the acks configuration docs to state that -1, 0, and 1 are the only 
allowed values




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3946) Protocol guide should say that Produce request acks can only be 0, 1, or -1

2016-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399367#comment-15399367
 ] 

ASF GitHub Bot commented on KAFKA-3946:
---

GitHub user mimaison opened a pull request:

https://github.com/apache/kafka/pull/1680

KAFKA-3946: Protocol guide should say that Produce request acks can o…

…nly be 0, 1, or -1

Rephrased the documentation string for the Produce request
Updated the acks configuration docs to state that -1, 0, and 1 are the only 
allowed values

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mimaison/kafka KAFKA-3946

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1680


commit afa886cf387b3b184f127c2b9b62e480e25d6730
Author: Mickael Maison 
Date:   2016-07-29T10:13:34Z

KAFKA-3946: Protocol guide should say that Produce request acks can only be 
0, 1, or -1

Rephrased the documentation string for the Produce request
Updated the acks configuration docs to state that -1, 0, and 1 are the only 
allowed values




> Protocol guide should say that Produce request acks can only be 0, 1, or -1
> ---
>
> Key: KAFKA-3946
> URL: https://issues.apache.org/jira/browse/KAFKA-3946
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>Assignee: Mickael Maison
>Priority: Minor
>
> The protocol guide at http://kafka.apache.org/protocol.html#protocol_messages 
> says that for Produce requests, acks means:
> The number of nodes that should replicate the produce before returning. -1 
> indicates the full ISR.
> This seems to imply that you can specify values of 2,3,4, etc.
> It would be clearer if the description was more explicit. It should say that 
> the only valid values are 0, 1, and -1, per the code at 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L382-L384



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3946) Protocol guide should say that Produce request acks can only be 0, 1, or -1

2016-07-29 Thread Mickael Maison (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison reassigned KAFKA-3946:
-

Assignee: Mickael Maison

> Protocol guide should say that Produce request acks can only be 0, 1, or -1
> ---
>
> Key: KAFKA-3946
> URL: https://issues.apache.org/jira/browse/KAFKA-3946
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>Assignee: Mickael Maison
>Priority: Minor
>
> The protocol guide at http://kafka.apache.org/protocol.html#protocol_messages 
> says that for Produce requests, acks means:
> The number of nodes that should replicate the produce before returning. -1 
> indicates the full ISR.
> This seems to imply that you can specify values of 2,3,4, etc.
> It would be clearer if the description was more explicit. It should say that 
> the only valid values are 0, 1, and -1, per the code at 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L382-L384



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1679: MINOR: replace reference to HoppingWindows in stre...

2016-07-29 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/1679

MINOR: replace reference to HoppingWindows in streams.html

HoppingWindows was removed prior to the 0.10.0 release. I've updated the 
doc to refer to the correct TimeWindows

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka 0.10.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1679.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1679


commit 0b893b1cd877075507863aefa3352ec3b2fc419d
Author: Damian Guy 
Date:   2016-07-29T09:47:29Z

remove reference to HoppingWindows as it is invalid




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---