[jira] [Commented] (KAFKA-2397) leave group request

2015-09-04 Thread Onur Karaman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730633#comment-14730633
 ] 

Onur Karaman commented on KAFKA-2397:
-

My pull request had diverged from trunk, so I force pushed a rebase that just 
cleans up the conflicts.

> leave group request
> ---
>
> Key: KAFKA-2397
> URL: https://issues.apache.org/jira/browse/KAFKA-2397
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>Priority: Minor
> Fix For: 0.8.3
>
>
> Let's say every consumer in a group has session timeout s. Currently, if a 
> consumer leaves the group, the worst case time to stabilize the group is 2s 
> (s to detect the consumer failure + s for the rebalance window). If a 
> consumer instead can declare they are leaving the group, the worst case time 
> to stabilize the group would just be the s associated with the rebalance 
> window.
> This is a low priority optimization!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2517) Performance Regression post SSL implementation

2015-09-04 Thread Ben Stopford (JIRA)
Ben Stopford created KAFKA-2517:
---

 Summary: Performance Regression post SSL implementation
 Key: KAFKA-2517
 URL: https://issues.apache.org/jira/browse/KAFKA-2517
 Project: Kafka
  Issue Type: Bug
Reporter: Ben Stopford
Assignee: Ben Stopford
Priority: Critical
 Fix For: 0.8.3


It would appear that we incurred a performance regression on submission of the 
SSL work affecting the performance of the new Kafka Consumer. 

Running with 1KB messages. Macbook 2.3 GHz Intel Core i7, 8GB, APPLE SSD 
SM256E. Single server instance. All local. 

kafka-consumer-perf-test.sh ... --messages 300  --new-consumer

Pre-SSL changes (commit 503bd36647695e8cc91893ffb80346dd03eb0bc5)
Steady state throughputs = 234.8 MB/s
(2861.5913, 234.8261, 3000596, 246233.0543)

Post-SSL changes (commit 13c432f7952de27e9bf8cb4adb33a91ae3a4b738) 
Steady state throughput =  178.1 MB/s  
(2861.5913, 178.1480, 3000596, 186801.7182)

Implication is a 25% reduction in consumer throughput for these test 
conditions. 

This appears to be caused by the use of PlaintextTransportLayer rather than 
SocketChannel in FileMessageSet.writeTo() meaning a zero copy transfer is not 
invoked.

Switching to the use of a SocketChannel directly in FileMessageSet.writeTo()  
yields the following result:
Steady state throughput =  281.8 MB/s
(2861.5913, 281.8191, 3000596, 295508.7650)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2431) Test SSL/TLS impact on performance

2015-09-04 Thread Ben Stopford (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730682#comment-14730682
 ] 

Ben Stopford commented on KAFKA-2431:
-

Post SSL regression in new consumer raised here: 
https://issues.apache.org/jira/browse/KAFKA-2517

> Test SSL/TLS impact on performance
> --
>
> Key: KAFKA-2431
> URL: https://issues.apache.org/jira/browse/KAFKA-2431
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Ismael Juma
>Assignee: Ben Stopford
>Priority: Blocker
> Fix For: 0.8.3
>
>
> Test new Producer and new Consumer performance with and without SSL/TLS once 
> the SSL/TLS branch is integrated.
> The ideal scenario is that SSL/TLS would not have an impact if disabled. When 
> enabled, there will be some overhead (encryption and the inability to use 
> `SendFile`) and it will be good to quantify it. The encryption overhead is 
> reduced if recent JDKs are used with CPUs that support AES-specific 
> instructions (https://en.wikipedia.org/wiki/AES_instruction_set).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2477) Replicas spuriously deleting all segments in partition

2015-09-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730713#comment-14730713
 ] 

Håkon Hitland commented on KAFKA-2477:
--

I don't see any out of sequence offsets.
Here are a couple of recent examples.
If I run with --deep-iteration, all offsets are present and sequential.
The result on the replica is identical to the leader.
---
[2015-09-02 23:43:03,379] ERROR [Replica Manager on Broker 0]: Error when 
processing fetch request for partition [log.event,3] offset 10591627212 from 
follower with correlation id 391785394. Possible cause: Request for offset 
10591627212 but we only have log segments in the range 10444248800 to 
10591627211. (kafka.server.ReplicaManager)

offset: 10591627210 position: 994954613 isvalid: true payloadsize: 674 magic: 0 
compresscodec: SnappyCompressionCodec crc: 4144791071
offset: 10591627211 position: 994955313 isvalid: true payloadsize: 1255 magic: 
0 compresscodec: SnappyCompressionCodec crc: 1011806998
offset: 10591627213 position: 994956594 isvalid: true payloadsize: 1460 magic: 
0 compresscodec: SnappyCompressionCodec crc: 4145284502
offset: 10591627215 position: 994958080 isvalid: true payloadsize: 1719 magic: 
0 compresscodec: SnappyCompressionCodec crc: 18110



[2015-09-03 11:44:02,483] ERROR [Replica Manager on Broker 3]: Error when 
processing fetch request for partition [log.count,5] offset 69746066284 from 
follower with correlation id 239821628. Possible cause: Request for offset 
69746066284 but we only have log segments in the range 68788206610 to 
69746066280. (kafka.server.ReplicaManager)

offset: 69746066278 position: 464897345 isvalid: true payloadsize: 674 magic: 0 
compresscodec: SnappyCompressionCodec crc: 3013732329
offset: 69746066279 position: 464898045 isvalid: true payloadsize: 234 magic: 0 
compresscodec: SnappyCompressionCodec crc: 3286064200
offset: 69746066283 position: 464898305 isvalid: true payloadsize: 486 magic: 0 
compresscodec: SnappyCompressionCodec crc: 747917524
offset: 69746066285 position: 464898817 isvalid: true payloadsize: 342 magic: 0 
compresscodec: SnappyCompressionCodec crc: 4283754786
offset: 69746066286 position: 464899185 isvalid: true payloadsize: 233 magic: 0 
compresscodec: SnappyCompressionCodec crc: 2129213572

> Replicas spuriously deleting all segments in partition
> --
>
> Key: KAFKA-2477
> URL: https://issues.apache.org/jira/browse/KAFKA-2477
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Håkon Hitland
> Attachments: kafka_log.txt
>
>
> We're seeing some strange behaviour in brokers: a replica will sometimes 
> schedule all segments in a partition for deletion, and then immediately start 
> replicating them back, triggering our check for under-replicating topics.
> This happens on average a couple of times a week, for different brokers and 
> topics.
> We have per-topic retention.ms and retention.bytes configuration, the topics 
> where we've seen this happen are hitting the size limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2517) Performance Regression post SSL implementation

2015-09-04 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730715#comment-14730715
 ] 

Ismael Juma commented on KAFKA-2517:


Performance improves over the baseline then?

> Performance Regression post SSL implementation
> --
>
> Key: KAFKA-2517
> URL: https://issues.apache.org/jira/browse/KAFKA-2517
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ben Stopford
>Assignee: Ben Stopford
>Priority: Critical
> Fix For: 0.8.3
>
>
> It would appear that we incurred a performance regression on submission of 
> the SSL work affecting the performance of the new Kafka Consumer. 
> Running with 1KB messages. Macbook 2.3 GHz Intel Core i7, 8GB, APPLE SSD 
> SM256E. Single server instance. All local. 
> kafka-consumer-perf-test.sh ... --messages 300  --new-consumer
> Pre-SSL changes (commit 503bd36647695e8cc91893ffb80346dd03eb0bc5)
> Steady state throughputs = 234.8 MB/s
> (2861.5913, 234.8261, 3000596, 246233.0543)
> Post-SSL changes (commit 13c432f7952de27e9bf8cb4adb33a91ae3a4b738) 
> Steady state throughput =  178.1 MB/s  
> (2861.5913, 178.1480, 3000596, 186801.7182)
> Implication is a 25% reduction in consumer throughput for these test 
> conditions. 
> This appears to be caused by the use of PlaintextTransportLayer rather than 
> SocketChannel in FileMessageSet.writeTo() meaning a zero copy transfer is not 
> invoked.
> Switching to the use of a SocketChannel directly in FileMessageSet.writeTo()  
> yields the following result:
> Steady state throughput =  281.8 MB/s
> (2861.5913, 281.8191, 3000596, 295508.7650)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2517) Performance Regression post SSL implementation

2015-09-04 Thread Ben Stopford (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730720#comment-14730720
 ] 

Ben Stopford commented on KAFKA-2517:
-

That would be the implication but I am suspicious of that result in isolation. 
The key point for this jira is that there is a problem which we need to fix, 
and there is a potential solution. 

> Performance Regression post SSL implementation
> --
>
> Key: KAFKA-2517
> URL: https://issues.apache.org/jira/browse/KAFKA-2517
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ben Stopford
>Assignee: Ben Stopford
>Priority: Critical
> Fix For: 0.8.3
>
>
> It would appear that we incurred a performance regression on submission of 
> the SSL work affecting the performance of the new Kafka Consumer. 
> Running with 1KB messages. Macbook 2.3 GHz Intel Core i7, 8GB, APPLE SSD 
> SM256E. Single server instance. All local. 
> kafka-consumer-perf-test.sh ... --messages 300  --new-consumer
> Pre-SSL changes (commit 503bd36647695e8cc91893ffb80346dd03eb0bc5)
> Steady state throughputs = 234.8 MB/s
> (2861.5913, 234.8261, 3000596, 246233.0543)
> Post-SSL changes (commit 13c432f7952de27e9bf8cb4adb33a91ae3a4b738) 
> Steady state throughput =  178.1 MB/s  
> (2861.5913, 178.1480, 3000596, 186801.7182)
> Implication is a 25% reduction in consumer throughput for these test 
> conditions. 
> This appears to be caused by the use of PlaintextTransportLayer rather than 
> SocketChannel in FileMessageSet.writeTo() meaning a zero copy transfer is not 
> invoked.
> Switching to the use of a SocketChannel directly in FileMessageSet.writeTo()  
> yields the following result:
> Steady state throughput =  281.8 MB/s
> (2861.5913, 281.8191, 3000596, 295508.7650)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2517) Performance Regression post SSL implementation

2015-09-04 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730726#comment-14730726
 ] 

Ismael Juma commented on KAFKA-2517:


Yes, sure.

> Performance Regression post SSL implementation
> --
>
> Key: KAFKA-2517
> URL: https://issues.apache.org/jira/browse/KAFKA-2517
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ben Stopford
>Assignee: Ben Stopford
>Priority: Critical
> Fix For: 0.8.3
>
>
> It would appear that we incurred a performance regression on submission of 
> the SSL work affecting the performance of the new Kafka Consumer. 
> Running with 1KB messages. Macbook 2.3 GHz Intel Core i7, 8GB, APPLE SSD 
> SM256E. Single server instance. All local. 
> kafka-consumer-perf-test.sh ... --messages 300  --new-consumer
> Pre-SSL changes (commit 503bd36647695e8cc91893ffb80346dd03eb0bc5)
> Steady state throughputs = 234.8 MB/s
> (2861.5913, 234.8261, 3000596, 246233.0543)
> Post-SSL changes (commit 13c432f7952de27e9bf8cb4adb33a91ae3a4b738) 
> Steady state throughput =  178.1 MB/s  
> (2861.5913, 178.1480, 3000596, 186801.7182)
> Implication is a 25% reduction in consumer throughput for these test 
> conditions. 
> This appears to be caused by the use of PlaintextTransportLayer rather than 
> SocketChannel in FileMessageSet.writeTo() meaning a zero copy transfer is not 
> invoked.
> Switching to the use of a SocketChannel directly in FileMessageSet.writeTo()  
> yields the following result:
> Steady state throughput =  281.8 MB/s
> (2861.5913, 281.8191, 3000596, 295508.7650)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1070) Auto-assign node id

2015-09-04 Thread Adrian Muraru (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730763#comment-14730763
 ] 

Adrian Muraru commented on KAFKA-1070:
--

[~nehanarkhede] For some reason this patch was not applied correctly in trunk 
branch, I see an {{.orig}} file in the patch here:
https://github.com/apache/kafka/commit/b1b80860a01cc378cfada3549a3480f0773c3ff8#diff-d0716898c8daed9887ef83500ee0e16e

> Auto-assign node id
> ---
>
> Key: KAFKA-1070
> URL: https://issues.apache.org/jira/browse/KAFKA-1070
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jay Kreps
>Assignee: Sriharsha Chintalapani
>  Labels: usability
> Fix For: 0.8.3
>
> Attachments: KAFKA-1070.patch, KAFKA-1070_2014-07-19_16:06:13.patch, 
> KAFKA-1070_2014-07-22_11:34:18.patch, KAFKA-1070_2014-07-24_20:58:17.patch, 
> KAFKA-1070_2014-07-24_21:05:33.patch, KAFKA-1070_2014-08-21_10:26:20.patch, 
> KAFKA-1070_2014-11-20_10:50:04.patch, KAFKA-1070_2014-11-25_20:29:37.patch, 
> KAFKA-1070_2015-01-01_17:39:30.patch, KAFKA-1070_2015-01-12_10:46:54.patch, 
> KAFKA-1070_2015-01-12_18:30:17.patch
>
>
> It would be nice to have Kafka brokers auto-assign node ids rather than 
> having that be a configuration. Having a configuration is irritating because 
> (1) you have to generate a custom config for each broker and (2) even though 
> it is in configuration, changing the node id can cause all kinds of bad 
> things to happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2518) Update NOTICE file

2015-09-04 Thread Flavio Junqueira (JIRA)
Flavio Junqueira created KAFKA-2518:
---

 Summary: Update NOTICE file
 Key: KAFKA-2518
 URL: https://issues.apache.org/jira/browse/KAFKA-2518
 Project: Kafka
  Issue Type: Bug
  Components: packaging
Reporter: Flavio Junqueira
Priority: Blocker


According to this page from ASF legal:

{noformat}
http://www.apache.org/legal/src-headers.html
{noformat}

the years in the NOTICE header should reflect the product name and years of 
distribution of the current and past versions of the product. The current 
NOTICE file says only 2012. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2510) Prevent broker from re-replicating / losing data due to disk misconfiguration

2015-09-04 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730832#comment-14730832
 ] 

Flavio Junqueira commented on KAFKA-2510:
-

The key problem is the one of a broker waking up and not finding data on disk. 
In such a scenario, is the broker faulty and lost disk state (via 
misconfiguration maybe), or is it starting from scratch? 

The solution is to write on some persistent store that the broker has written 
something to disk once it does. It could for example add to the partition 
metadata a mark once it creates the directory. Another way is to use some form 
of dbid, a number that reflects the instance of the disk state. The broker 
writes the dbid to the drive and immediately after to ZK. Upon restarting, the 
dbid must match. Note that the broker can't simply write the dbid to the 
registration znode, which is ephemeral. It must be a persistent znode.   

> Prevent broker from re-replicating / losing data due to disk misconfiguration
> -
>
> Key: KAFKA-2510
> URL: https://issues.apache.org/jira/browse/KAFKA-2510
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>
> Currently Kafka assumes that whatever it sees in the data directory is the 
> correct state of the data.
> This means that if an admin mistakenly configures Chef to use wrong data 
> directory, one of the following can happen:
> 1. The broker will replicate a bunch of partitions and take over the network
> 2. If you did this to enough brokers, you can lose entire topics and 
> partitions.
> We have information about existing topics, partitions and their ISR in 
> zookeeper.
> We need a mode in which if a broker starts, is in ISR for a partition and 
> doesn't have any data or directory for the partition, the broker will issue a 
> huge ERROR in the log and refuse to do anything for the partition.
> [~fpj] worked on the problem for ZK and had some ideas on what is required 
> here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1929) Convert core kafka module to use the errors in org.apache.kafka.common.errors

2015-09-04 Thread Jeff Holoman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730833#comment-14730833
 ] 

Jeff Holoman commented on KAFKA-1929:
-

Jason, I'm not working on this at this time. It was previously picked up by 
[~granthenke] but I don't think he's working on it currently. 




> Convert core kafka module to use the errors in org.apache.kafka.common.errors
> -
>
> Key: KAFKA-1929
> URL: https://issues.apache.org/jira/browse/KAFKA-1929
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>Assignee: Grant Henke
> Attachments: KAFKA-1929.patch
>
>
> With the introduction of the common package there are now a lot of errors 
> duplicated in both the common package and in the server. We should refactor 
> the server code (but not the scala clients) to switch over to the exceptions 
> in common.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2519) NetworkClient.close should remove node from inFlightRequests

2015-09-04 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-2519:
--

 Summary: NetworkClient.close should remove node from 
inFlightRequests
 Key: KAFKA-2519
 URL: https://issues.apache.org/jira/browse/KAFKA-2519
 Project: Kafka
  Issue Type: Bug
Reporter: Ismael Juma
Assignee: Ismael Juma
Priority: Critical
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-2519) NetworkClient.close should remove node from inFlightRequests

2015-09-04 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-2519 started by Ismael Juma.
--
> NetworkClient.close should remove node from inFlightRequests
> 
>
> Key: KAFKA-2519
> URL: https://issues.apache.org/jira/browse/KAFKA-2519
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Critical
> Fix For: 0.8.3
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1070) Auto-assign node id

2015-09-04 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730892#comment-14730892
 ] 

Sriharsha Chintalapani commented on KAFKA-1070:
---

[~amuraru] Patch is already merged into trunk.

> Auto-assign node id
> ---
>
> Key: KAFKA-1070
> URL: https://issues.apache.org/jira/browse/KAFKA-1070
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jay Kreps
>Assignee: Sriharsha Chintalapani
>  Labels: usability
> Fix For: 0.8.3
>
> Attachments: KAFKA-1070.patch, KAFKA-1070_2014-07-19_16:06:13.patch, 
> KAFKA-1070_2014-07-22_11:34:18.patch, KAFKA-1070_2014-07-24_20:58:17.patch, 
> KAFKA-1070_2014-07-24_21:05:33.patch, KAFKA-1070_2014-08-21_10:26:20.patch, 
> KAFKA-1070_2014-11-20_10:50:04.patch, KAFKA-1070_2014-11-25_20:29:37.patch, 
> KAFKA-1070_2015-01-01_17:39:30.patch, KAFKA-1070_2015-01-12_10:46:54.patch, 
> KAFKA-1070_2015-01-12_18:30:17.patch
>
>
> It would be nice to have Kafka brokers auto-assign node ids rather than 
> having that be a configuration. Having a configuration is irritating because 
> (1) you have to generate a custom config for each broker and (2) even though 
> it is in configuration, changing the node id can cause all kinds of bad 
> things to happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2519; NetworkClient.close should remove ...

2015-09-04 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/193

KAFKA-2519; NetworkClient.close should remove node from inFlightRequests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-2519-network-client-close-remove-in-flight

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #193


commit 8ff38fee7c194901cdcf168dd41eb969c6aaca47
Author: Ismael Juma 
Date:   2015-09-04T14:56:08Z

NetworkClient.close should remove node from inFlightRequests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2519) NetworkClient.close should remove node from inFlightRequests

2015-09-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730922#comment-14730922
 ] 

ASF GitHub Bot commented on KAFKA-2519:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/193

KAFKA-2519; NetworkClient.close should remove node from inFlightRequests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-2519-network-client-close-remove-in-flight

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #193


commit 8ff38fee7c194901cdcf168dd41eb969c6aaca47
Author: Ismael Juma 
Date:   2015-09-04T14:56:08Z

NetworkClient.close should remove node from inFlightRequests




> NetworkClient.close should remove node from inFlightRequests
> 
>
> Key: KAFKA-2519
> URL: https://issues.apache.org/jira/browse/KAFKA-2519
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Critical
> Fix For: 0.8.3
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2519) NetworkClient.close should remove node from inFlightRequests

2015-09-04 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2519:
---
Reviewer: Jun Rao
  Status: Patch Available  (was: In Progress)

[~junrao], can you please review this. It's a small fix for an important issue 
(I think this is the reason for a transient test failure I saw yesterday).

> NetworkClient.close should remove node from inFlightRequests
> 
>
> Key: KAFKA-2519
> URL: https://issues.apache.org/jira/browse/KAFKA-2519
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Critical
> Fix For: 0.8.3
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 36858: Patch for KAFKA-2120

2015-09-04 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36858/#review97710
---



clients/src/main/java/org/apache/kafka/clients/NetworkClient.java (lines 361 - 
366)


But those closed connections are only added to the disconnected list on the 
next selector.select() call right? So, you still have the issue that after a 
networkClient.poll() call, some socket connections are already cancelled but 
the connectionState is not reflecting that (need to wait for next poll() call). 
Also, it's a bit weird that the disconnect is initated in NetworkClient, but we 
have to push the info through selector and get it back.

I was thinking that we can call handleTimedOutRequests() after 
handleConnections(). In that call, we first figure out the nodeIds that need to 
be closed. Then call selector.close() and for each such node, call the 
following code.

connectionStates.disconnected(node);
log.debug("Node {} disconnected.", node);
for (ClientRequest request : 
this.inFlightRequests.clearAll(node)) {
log.trace("Cancelled request {} due to node {} being 
disconnected", request, node);
if (!metadataUpdater.maybeHandleDisconnection(request))
responses.add(new ClientResponse(request, now, true, 
null));
}

The above code can be put in a private method and be reused in 
handleDisconnected().

Then, we can get rid of selector.disconect and replace existing usage with 
selector.close instead.


- Jun Rao


On Sept. 3, 2015, 10:12 p.m., Mayuresh Gharat wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36858/
> ---
> 
> (Updated Sept. 3, 2015, 10:12 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2120
> https://issues.apache.org/jira/browse/KAFKA-2120
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Solved compile error
> 
> 
> Addressed Jason's comments for Kip-19
> 
> 
> Addressed Jun's comments
> 
> 
> Addressed Jason's comments about the default values for requestTimeout
> 
> 
> checkpoint
> 
> 
> Addressed Joel's concerns. Also tried to include Jun's feedback.
> 
> 
> Diffs
> -
> 
>   clients/src/main/java/org/apache/kafka/clients/ClientRequest.java 
> dc8f0f115bcda893c95d17c0a57be8d14518d034 
>   clients/src/main/java/org/apache/kafka/clients/CommonClientConfigs.java 
> 7d24c6f5dd2b63b96584f3aa8922a1d048dc1ae4 
>   clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java 
> 15d00d4e484bb5d51a9ae6857ed6e024a2cc1820 
>   clients/src/main/java/org/apache/kafka/clients/KafkaClient.java 
> 7ab2503794ff3aab39df881bd9fbae6547827d3b 
>   clients/src/main/java/org/apache/kafka/clients/NetworkClient.java 
> b31f7f1fbf93d29268b93811c9aad3e3c18e5312 
>   clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
> b9a2d4e2bc565f0ee72b27791afe5c894af262f1 
>   clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
> 938981c23ec16dfaf81d1e647929a59e1572f40f 
>   
> clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkClient.java
>  9517d9d0cd480d5ba1d12f1fde7963e60528d2f8 
>   clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java 
> 804d569498396d431880641041fc9292076452cb 
>   clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java 
> 06f00a99a73a288df9afa8c1d4abe3580fa968a6 
>   
> clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java
>  4cb1e50d6c4ed55241aeaef1d3af09def5274103 
>   
> clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
>  a152bd7697dca55609a9ec4cfe0a82c10595fbc3 
>   
> clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordBatch.java
>  06182db1c3a5da85648199b4c0c98b80ea7c6c0c 
>   
> clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 
> d2e64f7cd8bf56e433a210905b2874f71eee9ea0 
>   clients/src/main/java/org/apache/kafka/common/network/Selector.java 
> f49d54cbc1915ac686ff70ac657f08e4c96489c1 
>   clients/src/test/java/org/apache/kafka/clients/MockClient.java 
> 9133d85342b11ba2c9888d4d2804d181831e7a8e 
>   clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java 
> 43238ceaad0322e39802b615bb805b895336a009 
>   
> clients/src/test/java/org/apache/kafka/clients/producer/internals/BufferPoolTest.java
>  2c693824fa53db1e38766b8c66a0ef42ef9d0f3a 
>   
> clients/src/test/java/org/apache/kafka/clients/producer/internals/RecordAccumulatorTest.java
>  5b2e4ffaeab7127648db608c179703b27b577414 
>   
> clients/src/test/java/org/apache

[jira] [Updated] (KAFKA-2519) NetworkClient.close should remove node from inFlightRequests

2015-09-04 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-2519:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 193
[https://github.com/apache/kafka/pull/193]

> NetworkClient.close should remove node from inFlightRequests
> 
>
> Key: KAFKA-2519
> URL: https://issues.apache.org/jira/browse/KAFKA-2519
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Critical
> Fix For: 0.8.3
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2519) NetworkClient.close should remove node from inFlightRequests

2015-09-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731030#comment-14731030
 ] 

ASF GitHub Bot commented on KAFKA-2519:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/193


> NetworkClient.close should remove node from inFlightRequests
> 
>
> Key: KAFKA-2519
> URL: https://issues.apache.org/jira/browse/KAFKA-2519
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Critical
> Fix For: 0.8.3
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2519; NetworkClient.close should remove ...

2015-09-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/193


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-31 - Message format change proposal

2015-09-04 Thread Ewen Cheslack-Postava
On Thu, Sep 3, 2015 at 11:45 PM, Jiangjie Qin 
wrote:

> Hey Ewen,
>
> Thanks for the comments and they are really good questions. Please inline
> replies.
>
> On Thu, Sep 3, 2015 at 9:35 PM, Ewen Cheslack-Postava 
> wrote:
>
> > A few questions:
> >
> > 1. If we update the producers to only support V1, doesn't that mean
> people
> > get stuck on the current version of the producer until they can be sure
> all
> > their consumers have been upgraded? Is that going to be a problem for
> > anyone, and does it potentially keep important fixes/enhancements (e.g.
> the
> > upcoming client-side network request timeouts) because they have to wait
> > for all the consumers to upgrade first?
> >
> This is a good point. I thought about this before, I and my initial
> thinking is that we might need to add a config on producer to specify which
> version you want to use to produce. But this seems to be a pretty ad-hoc
> approach and I don't really like it. We are working on some general
> protocol version control mechanism proposal and will have a separate KIP
> for that.
>

The configs are annoying, but also can provide a way for us to guarantee a
smooth upgrade for users that use the defaults and upgrade their entire
infrastructure one version at a time:

v0.8.3:
no support for v1

v0.9.0:
broker: accept.format=v0 (v1 is rejected)
producer: produce.format=v0
consumer: include support for v0,v1

v0.9.1
broker: accept.format=v1
producer: produce.format=v1
consumer: v0,v1



> >
> > 2. Why minute granularity specifically? That's seems fairly specific to
> > LI's example workload (which, by the way, is a very useful reference
> point
> > to include in the KIP, thanks for that). If someone has the memory and
> can
> > support it, why not let them go all the way to per-record timestamps (by
> > making the interval configurable and they can set it really small/to 0)?
> It
> > seems to me that some people might be willing to pay that cost for the
> > application-side simplicity if they want to consume from specific
> > timestamps. With the proposal as is, seeking to a timestamp still
> requires
> > client-side logic to filter out messages that might have earlier
> timestamps
> > due to the granularity chosen. The obvious tradeoff here is yet another
> > config option -- I'm loath to add yet more configs, although this is one
> > that we can choose a reasonable default for (like 1 minute) and people
> can
> > easily change with no impact if they need to adjust it.
> >
> The searching granularity will be actually millisecond. The index
> granularity only determines how close you will be to the actually message
> with the timestamp you are looking for. For example, if you are looking for
> a message with timestamp 10:00:15, a minute granularity will give you the
> offset at 10:00:00, and it needs to go through the records from 10:00:00 to
> 10:00:15 to find the message. But with a second level granularity, it might
> only need to go through the message produced in one second. So minute level
> granularity index will take longer for search, but the precision will be
> the same as second level index. That said, I am not objecting to adding the
> granularity configuration but I am not sure how useful it would be to have
> second level index because I think typically a consumer will be
> long-running and only search for the timestamp at startup.
> I will update the KIP page to clarify the precision.
>
>
Ok, this makes sense.


> >
>
> 3. Why not respect the timestamp passed in as long as it's not some
> > sentinel value that indicates the broker should fill it in? When you do
> > know they will be ordered, as they should be with mirror maker (which is
> > specifically mentioned), this seems really useful. (More about this in
> > questions below...)
> >
> Like what you mentioned in (4), having a log without monotonically
> increasing timestamp is weird. To me it is even worse than having an empty
> timestamp field in the inner message that will not be used except for log
> compacted topic. I think the only way to solve this issue is to add another
> CreateTime to the message. So far I am not sure how useful it is though
> because arguably people can always put this timestamp in side the payload.
> So I think this timestamp is more for server side usage instead of
> application / client side usage.
>

I think setting it broker-side is fine, I just want to make sure user
expectations are clear. The existing search-by-timestamp has obvious
limitations. This proposal makes it much more accurate, so it needs to be
clear who is responsible for assigning the timestamps and what they mean
for an application. Applications that care about when the message was
actually created will still need to do some additional work.


>
> >
> > 4. I think one obvious answer to (3) is that accepting client's
> timestamps
> > could break seeking-by-timestamp since they may not be properly ordered.
> > However, I think this can break anyway under normal operat

[jira] [Commented] (KAFKA-2397) leave group request

2015-09-04 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731052#comment-14731052
 ] 

Jay Kreps commented on KAFKA-2397:
--

Couple thoughts:
1. [~hachikuji] Does this need to be in the next release? This is really an 
optimization we can do at any time right? How bad of a user experience is it 
not to have it?
2. Does anyone have a concrete idea of where using TCP close to signal 
disconnect falls short? [~becket_qin] I think you are saying this is a problem 
but when is it actually a problem? This might be one where broader input could 
help...
3. We shouldn't end up with two different ways to do the same thing just 
because two ways have been proposed and we aren't sure yet which is best. This 
just means we aren't done thinking through the design. I think likely zero is 
better than two, right?

> leave group request
> ---
>
> Key: KAFKA-2397
> URL: https://issues.apache.org/jira/browse/KAFKA-2397
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>Priority: Minor
> Fix For: 0.8.3
>
>
> Let's say every consumer in a group has session timeout s. Currently, if a 
> consumer leaves the group, the worst case time to stabilize the group is 2s 
> (s to detect the consumer failure + s for the rebalance window). If a 
> consumer instead can declare they are leaving the group, the worst case time 
> to stabilize the group would just be the s associated with the rebalance 
> window.
> This is a low priority optimization!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2397) leave group request

2015-09-04 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731073#comment-14731073
 ] 

Jason Gustafson commented on KAFKA-2397:


[~jkreps] Not having a way to properly leave the group is pretty painful in 
testing, which I imagine users are going to be doing a lot of initially. I 
think it also makes rolling upgrades trickier if you don't have it since you 
have to allow additional time for the group to stabilize after each machine is 
upgraded. The ideal workflow to minimize rebalance overhead would probably be 
to shutdown one instance, let the group stabilize, then restart it. If you just 
restart the instance, then the whole group will have to pause until the old 
member's session timeout has expired (Although you can also get around this by 
persisting the consumer id).

Anyway, I'd rather have something if possible, but I agree that it could be 
pushed to another release if we think the TCP option is the way forward.

> leave group request
> ---
>
> Key: KAFKA-2397
> URL: https://issues.apache.org/jira/browse/KAFKA-2397
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>Priority: Minor
> Fix For: 0.8.3
>
>
> Let's say every consumer in a group has session timeout s. Currently, if a 
> consumer leaves the group, the worst case time to stabilize the group is 2s 
> (s to detect the consumer failure + s for the rebalance window). If a 
> consumer instead can declare they are leaving the group, the worst case time 
> to stabilize the group would just be the s associated with the rebalance 
> window.
> This is a low priority optimization!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Kafka-trunk #616

2015-09-04 Thread Apache Jenkins Server
See 

Changes:

[junrao] KAFKA-2519; NetworkClient.close should remove node from 
inFlightRequests

--
[...truncated 1662 lines...]
kafka.coordinator.ConsumerCoordinatorResponseTest > 
testJoinGroupUnknownPartitionAssignmentStrategy PASSED

kafka.coordinator.ConsumerCoordinatorResponseTest > 
testJoinGroupInconsistentPartitionAssignmentStrategy PASSED

kafka.log.LogTest > testCorruptLog PASSED

kafka.coordinator.ConsumerCoordinatorResponseTest > 
testJoinGroupUnknownConsumerNewGroup PASSED

kafka.coordinator.ConsumerCoordinatorResponseTest > testValidJoinGroup PASSED

kafka.coordinator.ConsumerCoordinatorResponseTest > 
testHeartbeatUnknownConsumerExistingGroup PASSED

kafka.admin.DeleteConsumerGroupTest > testGroupWideDeleteInZK PASSED

kafka.coordinator.ConsumerCoordinatorResponseTest > testValidHeartbeat PASSED

kafka.admin.AddPartitionsTest > testManualAssignmentOfReplicas PASSED

kafka.log.LogTest > testLogRecoversToCorrectOffset PASSED

kafka.api.QuotasTest > testThrottledProducerConsumer PASSED

kafka.utils.timer.TimerTaskListTest > testAll PASSED

kafka.integration.FetcherTest > testFetcher PASSED

kafka.common.ConfigTest > testInvalidGroupIds PASSED

kafka.common.ConfigTest > testInvalidClientIds PASSED

kafka.log.LogTest > testReopenThenTruncate PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition PASSED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName PASSED

kafka.log.LogTest > testOpenDeletesObsoleteFiles PASSED

kafka.log.LogTest > testSizeBasedLogRoll PASSED

kafka.log.LogTest > testTimeBasedLogRollJitter PASSED

kafka.log.LogTest > testParseTopicPartitionName PASSED

kafka.log.LogTest > testTruncateTo PASSED

kafka.log.LogTest > testCleanShutdownFile PASSED

kafka.admin.DeleteTopicTest > testDeleteTopicAlreadyMarkedAsDeleted PASSED

kafka.admin.AdminTest > testShutdownBroker PASSED

kafka.api.ConsumerTest > testSeek PASSED

kafka.admin.AdminTest > testTopicCreationWithCollision PASSED

kafka.admin.AdminTest > testTopicCreationInZK PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[0] PASSED

kafka.api.ProducerSendTest > testCloseWithZeroTimeoutFromSenderThread PASSED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask PASSED

kafka.utils.SchedulerTest > testNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testRestart PASSED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler PASSED

kafka.utils.SchedulerTest > testPeriodicTask PASSED

kafka.api.ProducerFailureHandlingTest > testCannotSendToInternalTopic PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooLow 
PASSED

kafka.admin.AddPartitionsTest > testReplicaPlacement PASSED

kafka.api.ProducerFailureHandlingTest > testTooLargeRecordWithAckOne PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooHigh 
PASSED

kafka.admin.DeleteTopicTest > testPartitionReassignmentDuringDeleteTopic PASSED

kafka.api.ConsumerTest > testPositionAndCommit PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooHigh 
PASSED

kafka.admin.DeleteTopicTest > testDeleteNonExistingTopic PASSED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionEnabled 
PASSED

kafka.api.ProducerFailureHandlingTest > testWrongBrokerList PASSED

kafka.api.ConsumerTest > testUnsubscribeTopic PASSED

kafka.api.ConsumerBounceTest > testConsumptionWithBrokerFailures PASSED

kafka.utils.IteratorTemplateTest > testIterator PASSED

kafka.integration.RollingBounceTest > testRollingBounce PASSED

kafka.common.TopicTest > testInvalidTopicNames PASSED

kafka.common.TopicTest > testTopicHasCollision PASSED

kafka.common.TopicTest > testTopicHasCollisionChars PASSED

kafka.api.ProducerFailureHandlingTest > testNotEnoughReplicas PASSED

kafka.admin.DeleteTopicTest > testRecreateTopicAfterDeletion PASSED

kafka.api.ConsumerTest > testListTopics PASSED

kafka.admin.DeleteTopicTest > testAddPartitionDuringDeleteTopic PASSED

kafka.api.ProducerFailureHandlingTest > testNoResponse PASSED

kafka.api.ConsumerTest > testExpandingTopicSubscriptions PASSED

kafka.admin.DeleteTopicTest > testDeleteTopicWithAllAliveReplicas PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] PASSED

kafka.api.ProducerFailureHandlingTest > testNonExistentTopic PASSED

kafka.api.ConsumerTest > testGroupConsumption PASSED

kafka.integration.UncleanLeaderElectionTest > 
testCleanLeaderElectionDisabledByTopicOverride PASSED

kafka.admin.DeleteTopicTest > testDeleteTopicDuringAddPartition PASSED

kafka.api.ProducerFailureHandlingTest > testInvalidPartition PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.api.ConsumerTest > testPartitionsFor PASSED

kafka.api.ProducerFailureHandlingTest > testSendAfter

Re: Review Request 36858: Patch for KAFKA-2120

2015-09-04 Thread Mayuresh Gharat


> On Sept. 4, 2015, 4:07 p.m., Jun Rao wrote:
> > clients/src/main/java/org/apache/kafka/clients/NetworkClient.java, lines 
> > 362-367
> > 
> >
> > But those closed connections are only added to the disconnected list on 
> > the next selector.select() call right? So, you still have the issue that 
> > after a networkClient.poll() call, some socket connections are already 
> > cancelled but the connectionState is not reflecting that (need to wait for 
> > next poll() call). Also, it's a bit weird that the disconnect is initated 
> > in NetworkClient, but we have to push the info through selector and get it 
> > back.
> > 
> > I was thinking that we can call handleTimedOutRequests() after 
> > handleConnections(). In that call, we first figure out the nodeIds that 
> > need to be closed. Then call selector.close() and for each such node, call 
> > the following code.
> > 
> > connectionStates.disconnected(node);
> > log.debug("Node {} disconnected.", node);
> > for (ClientRequest request : 
> > this.inFlightRequests.clearAll(node)) {
> > log.trace("Cancelled request {} due to node {} being 
> > disconnected", request, node);
> > if (!metadataUpdater.maybeHandleDisconnection(request))
> > responses.add(new ClientResponse(request, now, 
> > true, null));
> > }
> > 
> > The above code can be put in a private method and be reused in 
> > handleDisconnected().
> > 
> > Then, we can get rid of selector.disconect and replace existing usage 
> > with selector.close instead.

Hi Jun,

Thanks a lot for the comments.

For the concern :

"But those closed connections are only added to the disconnected list on the 
next selector.select() call right? So, you still have the issue that after a 
networkClient.poll() call, some socket connections are already cancelled but 
the connectionState is not reflecting that (need to wait for next poll() 
call)", I am adding the explicitly disconnected nodes to the list of 
disconnected nodes in the same NetworkClient.poll() itself. I add it to a list 
called clientDisconnects. When the  NetworkClient calls handleDisconnections(), 
it calls selector.disconnected(). Inside selector.disconnected(), I check if 
there are any entries in clientDisconnects and add thenm to the disconnected 
list and clear the clientDisconnects list. This means that every call to 
disconnected will get you all the disconnected nodes in that 
NetworkClient.poll(). So the explicitly disconnected nodes are added to the 
disconnected list in the same NetworkClient.poll().

But this appraoch kind of changes what selector.disconnect() javadoc says. I 
thought that if client is explicltiy calling disconnect() he would expect the 
disconnection to happen immidiately. The approach that you suggested above 
avoids making this change and I agree with that. 

I think Kafka-2411 kind of introduced some changes and I will need to rebase 
this patch with that. I am facing some issues when I rebased (test failures), 
so I will try to get those solved, include the approach that you have suggested 
above and upload a new patch.


- Mayuresh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36858/#review97710
---


On Sept. 3, 2015, 10:12 p.m., Mayuresh Gharat wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36858/
> ---
> 
> (Updated Sept. 3, 2015, 10:12 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2120
> https://issues.apache.org/jira/browse/KAFKA-2120
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Solved compile error
> 
> 
> Addressed Jason's comments for Kip-19
> 
> 
> Addressed Jun's comments
> 
> 
> Addressed Jason's comments about the default values for requestTimeout
> 
> 
> checkpoint
> 
> 
> Addressed Joel's concerns. Also tried to include Jun's feedback.
> 
> 
> Diffs
> -
> 
>   clients/src/main/java/org/apache/kafka/clients/ClientRequest.java 
> dc8f0f115bcda893c95d17c0a57be8d14518d034 
>   clients/src/main/java/org/apache/kafka/clients/CommonClientConfigs.java 
> 7d24c6f5dd2b63b96584f3aa8922a1d048dc1ae4 
>   clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java 
> 15d00d4e484bb5d51a9ae6857ed6e024a2cc1820 
>   clients/src/main/java/org/apache/kafka/clients/KafkaClient.java 
> 7ab2503794ff3aab39df881bd9fbae6547827d3b 
>   clients/src/main/java/org/apache/kafka/clients/NetworkClient.java 
> b31f7f1fbf93d29268b93811c9aad3e3c18e5312 
>   clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.jav

[jira] [Commented] (KAFKA-2397) leave group request

2015-09-04 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731091#comment-14731091
 ] 

Jay Kreps commented on KAFKA-2397:
--

[~hachikuji] Makes sense.

> leave group request
> ---
>
> Key: KAFKA-2397
> URL: https://issues.apache.org/jira/browse/KAFKA-2397
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>Priority: Minor
> Fix For: 0.8.3
>
>
> Let's say every consumer in a group has session timeout s. Currently, if a 
> consumer leaves the group, the worst case time to stabilize the group is 2s 
> (s to detect the consumer failure + s for the rebalance window). If a 
> consumer instead can declare they are leaving the group, the worst case time 
> to stabilize the group would just be the s associated with the rebalance 
> window.
> This is a low priority optimization!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2491; update ErrorMapping with new consu...

2015-09-04 Thread hachikuji
Github user hachikuji closed the pull request at:

https://github.com/apache/kafka/pull/188


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2491) Update ErrorMapping with New Consumer Errors

2015-09-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731126#comment-14731126
 ] 

ASF GitHub Bot commented on KAFKA-2491:
---

Github user hachikuji closed the pull request at:

https://github.com/apache/kafka/pull/188


> Update ErrorMapping with New Consumer Errors
> 
>
> Key: KAFKA-2491
> URL: https://issues.apache.org/jira/browse/KAFKA-2491
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Minor
> Fix For: 0.8.3
>
>
> Some errors used by the new consumer have not been added to ErrorMapping. 
> Until this class is removed, it should probably be kept consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2491) Update ErrorMapping with New Consumer Errors

2015-09-04 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2491:
---
Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

Already fixed in the patch for KAFKA-2210.

> Update ErrorMapping with New Consumer Errors
> 
>
> Key: KAFKA-2491
> URL: https://issues.apache.org/jira/browse/KAFKA-2491
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Minor
> Fix For: 0.8.3
>
>
> Some errors used by the new consumer have not been added to ErrorMapping. 
> Until this class is removed, it should probably be kept consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2477) Replicas spuriously deleting all segments in partition

2015-09-04 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731210#comment-14731210
 ] 

Jun Rao commented on KAFKA-2477:


Thanks. Then the log looks normal. The only thing that I can recommend now is 
to try reproducing the issue locally and apply the trace level logging.

Also, since you are using snappy, you may want to apply the fixes in 0.8.2.2 
(https://people.apache.org/~junrao/kafka-0.8.2.2-candidate1/RELEASE_NOTES.html) 
once it's out. They may not be related to the issue that you are seeing here 
though.

> Replicas spuriously deleting all segments in partition
> --
>
> Key: KAFKA-2477
> URL: https://issues.apache.org/jira/browse/KAFKA-2477
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Håkon Hitland
> Attachments: kafka_log.txt
>
>
> We're seeing some strange behaviour in brokers: a replica will sometimes 
> schedule all segments in a partition for deletion, and then immediately start 
> replicating them back, triggering our check for under-replicating topics.
> This happens on average a couple of times a week, for different brokers and 
> topics.
> We have per-topic retention.ms and retention.bytes configuration, the topics 
> where we've seen this happen are hitting the size limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2513) Checkstyle is not executed until gradle's tests phase

2015-09-04 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731262#comment-14731262
 ] 

Ashish K Singh commented on KAFKA-2513:
---

I was suggesting this as nice to have. However, if you have strong opinion 
against it, we can drop this.

> Checkstyle is not executed until gradle's tests phase
> -
>
> Key: KAFKA-2513
> URL: https://issues.apache.org/jira/browse/KAFKA-2513
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
>Priority: Minor
>
> Checkstyle is added as a dependency to test and until someone runs test they 
> won't capture checkstyle issues. To me code style is more suited along with 
> compile than tests. This was also brought up on KAFKA-1893. May be we should 
> make checkstyle a dependency of jar, instead of test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2513) Checkstyle is not executed until gradle's tests phase

2015-09-04 Thread Ashish K Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish K Singh resolved KAFKA-2513.
---
Resolution: Won't Fix

> Checkstyle is not executed until gradle's tests phase
> -
>
> Key: KAFKA-2513
> URL: https://issues.apache.org/jira/browse/KAFKA-2513
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
>Priority: Minor
>
> Checkstyle is added as a dependency to test and until someone runs test they 
> won't capture checkstyle issues. To me code style is more suited along with 
> compile than tests. This was also brought up on KAFKA-1893. May be we should 
> make checkstyle a dependency of jar, instead of test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2513) Checkstyle is not executed until gradle's tests phase

2015-09-04 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731295#comment-14731295
 ] 

Ismael Juma commented on KAFKA-2513:


[~ewencp], by your argument, checkstyle shouldn't prevent one from running the 
unit/integration tests either (and it does). It's also worth recognising that 
this is a trade-off and the current approach does waste people's time in other 
situations. Personally I find it quite annoying to have a number of checkstyle 
failures when I think the code is in a good state and it's time to run the 
tests. I'd much prefer to get the failures incrementally as I am compiling so 
that I can fix it as I write the code (and I am near the code).

Anyway, I don't think there is a clearly superior way, it depends a lot on the 
workflow and it's good to understand the different ones. I have a few ideas of 
how to make things better, but that's not at the top of my priority queue at 
the moment. Hopefully not too far into the future.

> Checkstyle is not executed until gradle's tests phase
> -
>
> Key: KAFKA-2513
> URL: https://issues.apache.org/jira/browse/KAFKA-2513
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
>Priority: Minor
>
> Checkstyle is added as a dependency to test and until someone runs test they 
> won't capture checkstyle issues. To me code style is more suited along with 
> compile than tests. This was also brought up on KAFKA-1893. May be we should 
> make checkstyle a dependency of jar, instead of test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2397) leave group request

2015-09-04 Thread Onur Karaman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731364#comment-14731364
 ] 

Onur Karaman commented on KAFKA-2397:
-

Here's a summary of what I think each approach has to offer over the other.

Pros of LeaveGroupRequest:
1. simplicity. I think the logic is pretty straightforward, keeps the network 
and api layers separated, fits well with existing patterns, and doesn't require 
a complicated refactoring.
2. opens up the possibility for tooling that lets you kick out a consumer and 
trigger a rebalance. This might be a useful admin tool for when things go wrong.
3. opens up the possibility for rolling bounces without triggering a rebalance. 
We can modify the consumer to have a close and closeNow (close sends out a 
LeaveGroupRequest, closeNow doesn't). The application can persist the consumer 
id somewhere. The consumer can initially try out the persisted consumer id 
after it comes back up.

Pros of tcp disconnect:
1. rebalance gets triggered on process death. This would be a con if you want 
the possibility for rolling bounces without triggering a rebalance.

P.S: I'm going to be on vacation from tonight to Tuesday so my responsiveness 
will be a bit spotty from tonight until Tuesday. I think Jiangjie may be in a 
similar situation.

> leave group request
> ---
>
> Key: KAFKA-2397
> URL: https://issues.apache.org/jira/browse/KAFKA-2397
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>Priority: Minor
> Fix For: 0.8.3
>
>
> Let's say every consumer in a group has session timeout s. Currently, if a 
> consumer leaves the group, the worst case time to stabilize the group is 2s 
> (s to detect the consumer failure + s for the rebalance window). If a 
> consumer instead can declare they are leaving the group, the worst case time 
> to stabilize the group would just be the s associated with the rebalance 
> window.
> This is a low priority optimization!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2397) leave group request

2015-09-04 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731373#comment-14731373
 ] 

Jiangjie Qin commented on KAFKA-2397:
-

[~jkreps] using TCP close to signal disconnect does have merits. It works 
either when client process crashes or closes normally. It is just not very 
clear to me whether it is worth doing here.

The price we pay here is we have to propagate every connection close at network 
to coordinator. From the server log in LinkedIn I saw, socket closure is quite 
frequent. Todd even submitted a patch to change that particular log to debug 
level. They could just be the ad-hoc SyncProducer in old consumer to refresh 
metadata. Maybe I'm over concerned but I am a bit worried about the noise here.

I don't know in which case a TCP connection might be closed. Proxy was 
mentioned earlier, maybe some workload balancer / firewall / gateway, etc. I 
feel it might be another unnecessary assumption/dependency we introduce that is 
not buying us too much.

Another thing I am not sure is how often an application process crashes except 
people do a kill -9. In most cases there are multiple threads in an 
application. If an uncaught exception is thrown, usually only that thread dies 
and the process will hang but not exit unless the people do that explicitly 
like mirror maker does. In that case, is it reasonable to expect the 
client.close() to be called in the application shutdown hook or some finally 
block? (It may not be the case for some other language like C, though). If 
using TCP close mainly addresses kill -9. It is very likely that session 
timeout has already reached when people manually kill the process.

> leave group request
> ---
>
> Key: KAFKA-2397
> URL: https://issues.apache.org/jira/browse/KAFKA-2397
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Onur Karaman
>Assignee: Onur Karaman
>Priority: Minor
> Fix For: 0.8.3
>
>
> Let's say every consumer in a group has session timeout s. Currently, if a 
> consumer leaves the group, the worst case time to stabilize the group is 2s 
> (s to detect the consumer failure + s for the rebalance window). If a 
> consumer instead can declare they are leaving the group, the worst case time 
> to stabilize the group would just be the s associated with the rebalance 
> window.
> This is a low priority optimization!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2500) Make logEndOffset available in the 0.8.3 Consumer

2015-09-04 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731378#comment-14731378
 ] 

Jason Gustafson commented on KAFKA-2500:


[~willf] I realized that there is already KAFKA-2076 tracking a similar 
feature. Can you have a look at that issue and see if we can resolve this as a 
duplicate?

> Make logEndOffset available in the 0.8.3 Consumer
> -
>
> Key: KAFKA-2500
> URL: https://issues.apache.org/jira/browse/KAFKA-2500
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.8.3
>Reporter: Will Funnell
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.8.3
>
>
> Originally created in the old consumer here: 
> https://issues.apache.org/jira/browse/KAFKA-1977
> The requirement is to create a snapshot from the Kafka topic but NOT do 
> continual reads after that point. For example you might be creating a backup 
> of the data to a file.
> This ticket covers the addition of the functionality to the new consumer.
> In order to achieve that, a recommended solution by Joel Koshy and Jay Kreps 
> was to expose the high watermark, as maxEndOffset, from the FetchResponse 
> object through to each MessageAndMetadata object in order to be aware when 
> the consumer has reached the end of each partition.
> The submitted patch achieves this by adding the maxEndOffset to the 
> PartitionTopicInfo, which is updated when a new message arrives in the 
> ConsumerFetcherThread and then exposed in MessageAndMetadata.
> See here for discussion:
> http://search-hadoop.com/m/4TaT4TpJy71



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: Kafka-2440; Use `NetworkClient` instead of `Si...

2015-09-04 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/194

Kafka-2440; Use `NetworkClient` instead of `SimpleConsumer` to fetch data 
from replica



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-2440-use-network-client-in-fetcher

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/194.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #194


commit 6f34f70d5ac9c320cc8655c0de26aeee6cb2b300
Author: Ismael Juma 
Date:   2015-09-04T20:42:41Z

Use `NetworkClient` in `ReplicaFetcherThread`

`ConsumerFetcherThread` still uses `SimpleConsumer`.

commit 57a4c0a691167c503d8a51e3bb604f7b212c4890
Author: Ismael Juma 
Date:   2015-09-04T20:43:30Z

Call `metadataUpdater.maybeHandleDisconnection` from `NetworkClient.close()`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2510) Prevent broker from re-replicating / losing data due to disk misconfiguration

2015-09-04 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731393#comment-14731393
 ] 

Jiangjie Qin commented on KAFKA-2510:
-

Hey Gwen, can you elaborate a bit why the consumer offsets would be lost? Do 
you mean Zookeeper based offset storage? My understanding is that if the log 
directory was configured to be some wrong new path, the old data directory will 
still have all the data untouched. So it seems no actual data loss will be 
there.

> Prevent broker from re-replicating / losing data due to disk misconfiguration
> -
>
> Key: KAFKA-2510
> URL: https://issues.apache.org/jira/browse/KAFKA-2510
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>
> Currently Kafka assumes that whatever it sees in the data directory is the 
> correct state of the data.
> This means that if an admin mistakenly configures Chef to use wrong data 
> directory, one of the following can happen:
> 1. The broker will replicate a bunch of partitions and take over the network
> 2. If you did this to enough brokers, you can lose entire topics and 
> partitions.
> We have information about existing topics, partitions and their ISR in 
> zookeeper.
> We need a mode in which if a broker starts, is in ISR for a partition and 
> doesn't have any data or directory for the partition, the broker will issue a 
> huge ERROR in the log and refuse to do anything for the partition.
> [~fpj] worked on the problem for ZK and had some ideas on what is required 
> here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2440) Use `NetworkClient` instead of `SimpleConsumer` to fetch data from replica

2015-09-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731401#comment-14731401
 ] 

ASF GitHub Bot commented on KAFKA-2440:
---

GitHub user ijuma reopened a pull request:

https://github.com/apache/kafka/pull/194

KAFKA-2440; Use `NetworkClient` instead of `SimpleConsumer` to fetch data 
from replica



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-2440-use-network-client-in-fetcher

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/194.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #194


commit 6f34f70d5ac9c320cc8655c0de26aeee6cb2b300
Author: Ismael Juma 
Date:   2015-09-04T20:42:41Z

Use `NetworkClient` in `ReplicaFetcherThread`

`ConsumerFetcherThread` still uses `SimpleConsumer`.

commit 57a4c0a691167c503d8a51e3bb604f7b212c4890
Author: Ismael Juma 
Date:   2015-09-04T20:43:30Z

Call `metadataUpdater.maybeHandleDisconnection` from `NetworkClient.close()`




> Use `NetworkClient` instead of `SimpleConsumer` to fetch data from replica
> --
>
> Key: KAFKA-2440
> URL: https://issues.apache.org/jira/browse/KAFKA-2440
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 0.8.3
>
>
> This is necessary for SSL/TLS support for inter-broker communication as 
> `SimpleConsumer` will not be updated to support SSL/TLS.
> As explained by [~junrao] in KAFKA-2411: we need to be a bit careful since 
> the follower fetcher thread doesn't need to refresh metadata itself. Instead, 
> the information about the leader is propagated from the controller.
> This work was originally described in KAFKA-2411, which was then updated to 
> be more narrowly focused on replacing `BlockingChannel` with `Selector`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2440; Use `NetworkClient` instead of `Si...

2015-09-04 Thread ijuma
GitHub user ijuma reopened a pull request:

https://github.com/apache/kafka/pull/194

KAFKA-2440; Use `NetworkClient` instead of `SimpleConsumer` to fetch data 
from replica



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-2440-use-network-client-in-fetcher

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/194.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #194


commit 6f34f70d5ac9c320cc8655c0de26aeee6cb2b300
Author: Ismael Juma 
Date:   2015-09-04T20:42:41Z

Use `NetworkClient` in `ReplicaFetcherThread`

`ConsumerFetcherThread` still uses `SimpleConsumer`.

commit 57a4c0a691167c503d8a51e3bb604f7b212c4890
Author: Ismael Juma 
Date:   2015-09-04T20:43:30Z

Call `metadataUpdater.maybeHandleDisconnection` from `NetworkClient.close()`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-2440; Use `NetworkClient` instead of `Si...

2015-09-04 Thread ijuma
Github user ijuma closed the pull request at:

https://github.com/apache/kafka/pull/194


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2440) Use `NetworkClient` instead of `SimpleConsumer` to fetch data from replica

2015-09-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731400#comment-14731400
 ] 

ASF GitHub Bot commented on KAFKA-2440:
---

Github user ijuma closed the pull request at:

https://github.com/apache/kafka/pull/194


> Use `NetworkClient` instead of `SimpleConsumer` to fetch data from replica
> --
>
> Key: KAFKA-2440
> URL: https://issues.apache.org/jira/browse/KAFKA-2440
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 0.8.3
>
>
> This is necessary for SSL/TLS support for inter-broker communication as 
> `SimpleConsumer` will not be updated to support SSL/TLS.
> As explained by [~junrao] in KAFKA-2411: we need to be a bit careful since 
> the follower fetcher thread doesn't need to refresh metadata itself. Instead, 
> the information about the leader is propagated from the controller.
> This work was originally described in KAFKA-2411, which was then updated to 
> be more narrowly focused on replacing `BlockingChannel` with `Selector`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2510) Prevent broker from re-replicating / losing data due to disk misconfiguration

2015-09-04 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731403#comment-14731403
 ] 

Gwen Shapira commented on KAFKA-2510:
-

the old data directory will be untouched, but because there are no real errors 
and everything will appear to be working.
New data coming in starting with offset 0, consumers happily consuming this 
data, and a new __offsets topic.

Few hours or days later, when you figured what happened - you are kind of stuck 
- what are you going to do with the old data directory?

> Prevent broker from re-replicating / losing data due to disk misconfiguration
> -
>
> Key: KAFKA-2510
> URL: https://issues.apache.org/jira/browse/KAFKA-2510
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>
> Currently Kafka assumes that whatever it sees in the data directory is the 
> correct state of the data.
> This means that if an admin mistakenly configures Chef to use wrong data 
> directory, one of the following can happen:
> 1. The broker will replicate a bunch of partitions and take over the network
> 2. If you did this to enough brokers, you can lose entire topics and 
> partitions.
> We have information about existing topics, partitions and their ISR in 
> zookeeper.
> We need a mode in which if a broker starts, is in ISR for a partition and 
> doesn't have any data or directory for the partition, the broker will issue a 
> huge ERROR in the log and refuse to do anything for the partition.
> [~fpj] worked on the problem for ZK and had some ideas on what is required 
> here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2440) Use `NetworkClient` instead of `SimpleConsumer` to fetch data from replica

2015-09-04 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2440:
---
Reviewer: Jun Rao
  Status: Patch Available  (was: In Progress)

I submitted a PR for this. `NetworkClient` is used in `ReplicaFetcherThread` 
while `SimpleConsumer` is used in `ConsumerFetcherThread`. It would be good to 
get feedback on the approach I took.

[~junrao], I set you as the reviewer. I hope that's OK.

> Use `NetworkClient` instead of `SimpleConsumer` to fetch data from replica
> --
>
> Key: KAFKA-2440
> URL: https://issues.apache.org/jira/browse/KAFKA-2440
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 0.8.3
>
>
> This is necessary for SSL/TLS support for inter-broker communication as 
> `SimpleConsumer` will not be updated to support SSL/TLS.
> As explained by [~junrao] in KAFKA-2411: we need to be a bit careful since 
> the follower fetcher thread doesn't need to refresh metadata itself. Instead, 
> the information about the leader is propagated from the controller.
> This work was originally described in KAFKA-2411, which was then updated to 
> be more narrowly focused on replacing `BlockingChannel` with `Selector`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2510) Prevent broker from re-replicating / losing data due to disk misconfiguration

2015-09-04 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731497#comment-14731497
 ] 

Jay Kreps commented on KAFKA-2510:
--

I actually think we shouldn't prevent this.

In our replication model the data on disk is basically a cache. If it's there 
the broker uses it to help make its own recovery faster and just pulls the diff 
from replicas. If its not there it recreates it. You are allowed to lose what 
is on disk at any time.

For the chef case, yes, if you botch the directory you will replicate data, but 
the same is true of the old node id as well as things like the zk URL, etc. 
Replicating will be slow but not fatal. The case of rolling restart is not 
correct if you use controlled shutdown. If you don't use controlled shutdown 
you will lose data no matter what.

So today if you want you can wipe your data and restart and the broker happily 
re-replicates. If you have a disk failure you can repair it and restart. And if 
your AWS instance disappears you can move it over to another and it 
re-replicates.

We actually intended to exploit this for running in AWS and Mesos more 
elastically when we do automated data balancing. For example in Mesos the mesos 
guys are going to add a feature in Marathon where tasks will be semi-sticky to 
nodes. So if a Kafka node is restarted or dies mesos will prefer to restart it 
on the node it was on (if that node is still around and has free slots). If not 
it will start it elsewhere (where it will, of course, have no data).

> Prevent broker from re-replicating / losing data due to disk misconfiguration
> -
>
> Key: KAFKA-2510
> URL: https://issues.apache.org/jira/browse/KAFKA-2510
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>
> Currently Kafka assumes that whatever it sees in the data directory is the 
> correct state of the data.
> This means that if an admin mistakenly configures Chef to use wrong data 
> directory, one of the following can happen:
> 1. The broker will replicate a bunch of partitions and take over the network
> 2. If you did this to enough brokers, you can lose entire topics and 
> partitions.
> We have information about existing topics, partitions and their ISR in 
> zookeeper.
> We need a mode in which if a broker starts, is in ISR for a partition and 
> doesn't have any data or directory for the partition, the broker will issue a 
> huge ERROR in the log and refuse to do anything for the partition.
> [~fpj] worked on the problem for ZK and had some ideas on what is required 
> here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2510) Prevent broker from re-replicating / losing data due to disk misconfiguration

2015-09-04 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731497#comment-14731497
 ] 

Jay Kreps edited comment on KAFKA-2510 at 9/4/15 9:56 PM:
--

I actually think we shouldn't prevent this.

In our replication model the data on disk is basically a cache. If it's there 
the broker uses it to help make its own recovery faster and just pulls the diff 
from replicas. If its not there it recreates it. You are allowed to lose what 
is on disk at any time.

For the chef case, yes, if you botch the directory you will replicate data, but 
the same is true of the old node id as well as things like the zk URL, etc. 
Replicating will be slow but not fatal. The case of rolling restart you 
actually won't lose data if you use controlled shutdown. If you don't use 
controlled shutdown you will lose data no matter what.

So today if you want you can wipe your data and restart and the broker happily 
re-replicates. If you have a disk failure you can repair it and restart. And if 
your AWS instance disappears you can move it over to another and it 
re-replicates. If you need to rebuild your RAID array, or want to add disks and 
get data on them all of these can be accomplished by deleting your data and 
bouncing the instance.

We actually intended to exploit this for running in AWS and Mesos more 
elastically when we do automated data balancing. For example in Mesos the mesos 
guys are going to add a feature in Marathon where tasks will be semi-sticky to 
nodes. So if a Kafka node is restarted or dies mesos will prefer to restart it 
on the node it was on (if that node is still around and has free slots). If not 
it will start it elsewhere (where it will, of course, have no data).


was (Author: jkreps):
I actually think we shouldn't prevent this.

In our replication model the data on disk is basically a cache. If it's there 
the broker uses it to help make its own recovery faster and just pulls the diff 
from replicas. If its not there it recreates it. You are allowed to lose what 
is on disk at any time.

For the chef case, yes, if you botch the directory you will replicate data, but 
the same is true of the old node id as well as things like the zk URL, etc. 
Replicating will be slow but not fatal. The case of rolling restart is not 
correct if you use controlled shutdown. If you don't use controlled shutdown 
you will lose data no matter what.

So today if you want you can wipe your data and restart and the broker happily 
re-replicates. If you have a disk failure you can repair it and restart. And if 
your AWS instance disappears you can move it over to another and it 
re-replicates.

We actually intended to exploit this for running in AWS and Mesos more 
elastically when we do automated data balancing. For example in Mesos the mesos 
guys are going to add a feature in Marathon where tasks will be semi-sticky to 
nodes. So if a Kafka node is restarted or dies mesos will prefer to restart it 
on the node it was on (if that node is still around and has free slots). If not 
it will start it elsewhere (where it will, of course, have no data).

> Prevent broker from re-replicating / losing data due to disk misconfiguration
> -
>
> Key: KAFKA-2510
> URL: https://issues.apache.org/jira/browse/KAFKA-2510
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>
> Currently Kafka assumes that whatever it sees in the data directory is the 
> correct state of the data.
> This means that if an admin mistakenly configures Chef to use wrong data 
> directory, one of the following can happen:
> 1. The broker will replicate a bunch of partitions and take over the network
> 2. If you did this to enough brokers, you can lose entire topics and 
> partitions.
> We have information about existing topics, partitions and their ISR in 
> zookeeper.
> We need a mode in which if a broker starts, is in ISR for a partition and 
> doesn't have any data or directory for the partition, the broker will issue a 
> huge ERROR in the log and refuse to do anything for the partition.
> [~fpj] worked on the problem for ZK and had some ideas on what is required 
> here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2211: Adding simpleAclAuthorizer impleme...

2015-09-04 Thread Parth-Brahmbhatt
GitHub user Parth-Brahmbhatt opened a pull request:

https://github.com/apache/kafka/pull/195

KAFKA-2211: Adding simpleAclAuthorizer implementation and test cases.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Parth-Brahmbhatt/kafka KAFKA-2211

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #195


commit 7be79e12a4106f7c9963e5cd6fcc1d5438f60e46
Author: Parth Brahmbhatt 
Date:   2015-09-04T22:21:06Z

KAFKA-2211: Adding simpleAclAuthorizer implementation and test cases.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2211) KafkaAuthorizer: Add simpleACLAuthorizer implementation.

2015-09-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731532#comment-14731532
 ] 

ASF GitHub Bot commented on KAFKA-2211:
---

GitHub user Parth-Brahmbhatt opened a pull request:

https://github.com/apache/kafka/pull/195

KAFKA-2211: Adding simpleAclAuthorizer implementation and test cases.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Parth-Brahmbhatt/kafka KAFKA-2211

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #195


commit 7be79e12a4106f7c9963e5cd6fcc1d5438f60e46
Author: Parth Brahmbhatt 
Date:   2015-09-04T22:21:06Z

KAFKA-2211: Adding simpleAclAuthorizer implementation and test cases.




> KafkaAuthorizer: Add simpleACLAuthorizer implementation.
> 
>
> Key: KAFKA-2211
> URL: https://issues.apache.org/jira/browse/KAFKA-2211
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Parth Brahmbhatt
>Assignee: Parth Brahmbhatt
>Priority: Blocker
> Fix For: 0.8.3
>
> Attachments: KAFKA-2211.patch
>
>
> Subtask-2 for Kafka-1688. 
> Please see KIP-11 to get details on out of box SimpleACLAuthorizer 
> implementation 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+Interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2211) KafkaAuthorizer: Add simpleACLAuthorizer implementation.

2015-09-04 Thread Parth Brahmbhatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Brahmbhatt updated KAFKA-2211:

Status: Patch Available  (was: In Progress)

> KafkaAuthorizer: Add simpleACLAuthorizer implementation.
> 
>
> Key: KAFKA-2211
> URL: https://issues.apache.org/jira/browse/KAFKA-2211
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Parth Brahmbhatt
>Assignee: Parth Brahmbhatt
>Priority: Blocker
> Fix For: 0.8.3
>
> Attachments: KAFKA-2211.patch
>
>
> Subtask-2 for Kafka-1688. 
> Please see KIP-11 to get details on out of box SimpleACLAuthorizer 
> implementation 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+Interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-2211) KafkaAuthorizer: Add simpleACLAuthorizer implementation.

2015-09-04 Thread Parth Brahmbhatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-2211 started by Parth Brahmbhatt.
---
> KafkaAuthorizer: Add simpleACLAuthorizer implementation.
> 
>
> Key: KAFKA-2211
> URL: https://issues.apache.org/jira/browse/KAFKA-2211
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Parth Brahmbhatt
>Assignee: Parth Brahmbhatt
>Priority: Blocker
> Fix For: 0.8.3
>
> Attachments: KAFKA-2211.patch
>
>
> Subtask-2 for Kafka-1688. 
> Please see KIP-11 to get details on out of box SimpleACLAuthorizer 
> implementation 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+Interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2500) Make logEndOffset available in the 0.8.3 Consumer

2015-09-04 Thread Will Funnell (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731538#comment-14731538
 ] 

Will Funnell commented on KAFKA-2500:
-

[~hachikuji] From my understanding it would seem that KAFKA-2076 may not 
exactly solve the use case here.

What I need, as solved in KAFKA-1977 for the current consumer API, is the 
ability to consume a full snapshot of all the messages on a log compacted 
topic, ensuring each key has been consumed at least once.

It would seem that although KAFKA-2076 does make this possible, it requires a 
separate call to discover the high watermark. By the time the call has returned 
the topic may have received further messages, but if the high watermark is 
returned with each message, its possible to tell whether that is that last one 
and to immediately stop consuming.

It would also be very useful to expose the log cleaner point, this way you know 
when you have consumed past the point of missing any possible duplicate keys 
that have since been compacted.

> Make logEndOffset available in the 0.8.3 Consumer
> -
>
> Key: KAFKA-2500
> URL: https://issues.apache.org/jira/browse/KAFKA-2500
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.8.3
>Reporter: Will Funnell
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.8.3
>
>
> Originally created in the old consumer here: 
> https://issues.apache.org/jira/browse/KAFKA-1977
> The requirement is to create a snapshot from the Kafka topic but NOT do 
> continual reads after that point. For example you might be creating a backup 
> of the data to a file.
> This ticket covers the addition of the functionality to the new consumer.
> In order to achieve that, a recommended solution by Joel Koshy and Jay Kreps 
> was to expose the high watermark, as maxEndOffset, from the FetchResponse 
> object through to each MessageAndMetadata object in order to be aware when 
> the consumer has reached the end of each partition.
> The submitted patch achieves this by adding the maxEndOffset to the 
> PartitionTopicInfo, which is updated when a new message arrives in the 
> ConsumerFetcherThread and then exposed in MessageAndMetadata.
> See here for discussion:
> http://search-hadoop.com/m/4TaT4TpJy71



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34493: Patch for KAFKA-2211

2015-09-04 Thread Parth Brahmbhatt


> On Aug. 23, 2015, 9:28 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala, lines 
> > 55-57
> > 
> >
> > To be consistent with how we pass in configs for pluggable components, 
> > we need to get the external properties directly through 
> > KafkaConfig.originals() instead of from an external property file.

Again, all this code is too old. We now have Authorizer extending configurable 
so we dont get KafkaConfig instance here. We actually get the original 
properties map and we initiliaze the variables from that map. We dont expect a 
new properties file anymore.


> On Aug. 23, 2015, 9:28 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala, lines 
> > 66-68
> > 
> >
> > Perhaps this can be moved to ZkUtils.setupCommonPaths()?

This path is kind of optional as we only need it if someone wants to use 
authorizer + if they want to use our authorizer. I think creating this path as 
part of commonPaths will be confusing.


> On Aug. 23, 2015, 9:28 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala, lines 
> > 70-72
> > 
> >
> > Hmm, not sure why we need the scheduler. It seems that it's enough to 
> > just read every acl from ZK first on initialization. That's how 
> > topic/client config changes works. What's the re-connection issue that you 
> > mentioned?

Please read http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html 
section "Things to Remember about Watches". Bottomline is there are 
pathological cases where a watcher can get missed. All I really need is set a 
TTL On cache entry, I could add that or keep the scheduler so it expires all 
cache entries every hour.


> On Aug. 23, 2015, 9:28 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala, lines 
> > 110-111
> > 
> >
> > Is this right? It seems the expansion should happen on the acl side. 
> > Basically, if you configure an ACL to read from a topic, you automatically 
> > get an ACL to describe the topic.

I am not sure what you mean by "the expansion should happen on the acl side". 
Do you mean we should add the describe acl when someone adds an acl for read or 
write?  That does not seem right because in that case when they remove the READ 
or WRITE acl, do we also remove describe acl? or removal will have to be 
explicit? Seems confusing. I think the current way is cleaner.


> On Aug. 23, 2015, 9:28 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala, line 134
> > 
> >
> > Hmm, the acl should only match if operation is a subset of 
> > acl.operations, right?

again code has changed quite a bit given acl only stores one operation now. 

The original code was doing the right thing as the list only had > 1 element in 
describe case and in that case the match method was checking if the 
acl.oprations had atleast one of the 3 DESCRIBE,READ or WRITE.


> On Aug. 23, 2015, 9:28 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala, lines 
> > 220-227
> > 
> >
> > Not sure why we need to read from ZK during reads since writes should 
> > only be propagated from the ZK listner. You can take a look at how 
> > ConfigChangeListener is implemented.

Again my understanding of how zookeeper watchers work with zkClient is that it 
is possible to miss watchers so I don't completely rely on the watchers always 
firing. If that understading is incorrect i can remove the scheduler thread and 
change the code to only read from zookeeper as part of handling the watcher 
notification.


- Parth


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34493/#review96128
---


On May 20, 2015, 8:03 p.m., Parth Brahmbhatt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34493/
> ---
> 
> (Updated May 20, 2015, 8:03 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2211
> https://issues.apache.org/jira/browse/KAFKA-2211
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-2211: Out of box implementation for authorizer interface.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala 
> PR

Re: Review Request 34493: Patch for KAFKA-2211

2015-09-04 Thread Parth Brahmbhatt


> On Aug. 21, 2015, 2:31 p.m., Ismael Juma wrote:
> > Thanks for this Parth. I did an initial pass where I left a number comments 
> > (many of them style-related, see http://kafka.apache.org/coding-guide.html 
> > for reference). I know, we should have a tool that checks some of these 
> > things automatically. That is my main priority after we get the security 
> > stuff in shape.
> > 
> > I think it would be useful if you had a look and made the changes (if you 
> > agree) to the cases I pointed out and similar ones. I noticed that 
> > KAFKA-2212 has some similar issues too, it may be worth taking a pass there 
> > too.
> > 
> > I will look at these two patches again early next week.
> 
> Parth Brahmbhatt wrote:
> Hey , thanks for reviewing this. This patch needs to be updated with all 
> the changes that we have made in 2210. As 2210 was moving slow and was kind 
> of a moving target I did not update this patch. I think I have gained little 
> more understanding around scala styles from the 2210 review so I will 
> incorporate all those in this patch.
> 
> Ismael Juma wrote:
> Thank you Parth!

Is it ok if I post a pull request with the new changes instead of using review 
board?


> On Aug. 21, 2015, 2:31 p.m., Ismael Juma wrote:
> > core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala, line 118
> > 
> >
> > In Scala, `return` is normally avoided because it makes it harder to 
> > reason about some code in isolation. Could this code be rewritten in a more 
> > readable way without using `return`? The answer may well be no, interested 
> > in your thoughts.

I think I have addressed all your comments but this one. This seems one of 
those cases where the code will be more redable with return statement given 
there are so many different checks and at end of each of those checks we might 
have found the final result.


- Parth


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34493/#review96042
---


On May 20, 2015, 8:03 p.m., Parth Brahmbhatt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34493/
> ---
> 
> (Updated May 20, 2015, 8:03 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2211
> https://issues.apache.org/jira/browse/KAFKA-2211
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-2211: Out of box implementation for authorizer interface.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala 
> PRE-CREATION 
>   core/src/test/resources/authorizer-config.properties PRE-CREATION 
>   core/src/test/scala/unit/kafka/security/auth/SimpleAclAuthorizerTest.scala 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/34493/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Parth Brahmbhatt
> 
>



[jira] [Commented] (KAFKA-2120) Add a request timeout to NetworkClient

2015-09-04 Thread Tapasi Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731570#comment-14731570
 ] 

Tapasi Paul commented on KAFKA-2120:


Hello folks,

We are facing an issue with the networkproducerclient(rls 8.2.1) spinning 
issue as reported in this Jira , when kafka is down. Looks like there is a 
patch available and in the distribution I see rls 8.2.2 is now available. Can 
you please tell me if this fix for the timeout is there in 8.2.2 ?
If not, when it would be available ?

Thanks

> Add a request timeout to NetworkClient
> --
>
> Key: KAFKA-2120
> URL: https://issues.apache.org/jira/browse/KAFKA-2120
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Jiangjie Qin
>Assignee: Mayuresh Gharat
>Priority: Blocker
> Fix For: 0.8.3
>
> Attachments: KAFKA-2120.patch, KAFKA-2120_2015-07-27_15:31:19.patch, 
> KAFKA-2120_2015-07-29_15:57:02.patch, KAFKA-2120_2015-08-10_19:55:18.patch, 
> KAFKA-2120_2015-08-12_10:59:09.patch, KAFKA-2120_2015-09-03_15:12:02.patch
>
>
> Currently NetworkClient does not have a timeout setting for requests. So if 
> no response is received for a request due to reasons such as broker is down, 
> the request will never be completed.
> Request timeout will also be used as implicit timeout for some methods such 
> as KafkaProducer.flush() and kafkaProducer.close().
> KIP-19 is created for this public interface change.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-19+-+Add+a+request+timeout+to+NetworkClient



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Contributor List

2015-09-04 Thread Bill Bejeck
Hi can i get added to the contributor list? I'd like to take crack at
KAFKA-2058 

Thanks!

Bill Bejeck


Re: Contributor List

2015-09-04 Thread Gwen Shapira
Thank you!

Please create a Jira user if you didn't do so already, and let me know your
user name. I'll add you to the list.

On Fri, Sep 4, 2015 at 4:33 PM, Bill Bejeck  wrote:

> Hi can i get added to the contributor list? I'd like to take crack at
> KAFKA-2058 
>
> Thanks!
>
> Bill Bejeck
>


[jira] [Commented] (KAFKA-2510) Prevent broker from re-replicating / losing data due to disk misconfiguration

2015-09-04 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731629#comment-14731629
 ] 

Gwen Shapira commented on KAFKA-2510:
-

[~jkreps] I agree that this capability is important to some uses cases.
I also think that preventing sysadmins from accidentally losing data on an 
entire cluster can be an important thing to support.

Which is why I think it should be configurable (like delete.topic.enable and 
unsafe.leader.election) - there are legit tradeoffs.



> Prevent broker from re-replicating / losing data due to disk misconfiguration
> -
>
> Key: KAFKA-2510
> URL: https://issues.apache.org/jira/browse/KAFKA-2510
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>
> Currently Kafka assumes that whatever it sees in the data directory is the 
> correct state of the data.
> This means that if an admin mistakenly configures Chef to use wrong data 
> directory, one of the following can happen:
> 1. The broker will replicate a bunch of partitions and take over the network
> 2. If you did this to enough brokers, you can lose entire topics and 
> partitions.
> We have information about existing topics, partitions and their ISR in 
> zookeeper.
> We need a mode in which if a broker starts, is in ISR for a partition and 
> doesn't have any data or directory for the partition, the broker will issue a 
> huge ERROR in the log and refuse to do anything for the partition.
> [~fpj] worked on the problem for ZK and had some ideas on what is required 
> here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 36858: Patch for KAFKA-2120

2015-09-04 Thread Mayuresh Gharat

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36858/
---

(Updated Sept. 5, 2015, 12:49 a.m.)


Review request for kafka.


Bugs: KAFKA-2120
https://issues.apache.org/jira/browse/KAFKA-2120


Repository: kafka


Description (updated)
---

Solved compile error


Addressed Jason's comments for Kip-19


Addressed Jun's comments


Addressed Jason's comments about the default values for requestTimeout


checkpoint


Addressed Joel's concerns. Also tried to include Jun's feedback.


Fixed a minor comment


Solved unittest issue


Addressed Jun's comments regarding NetworkClient


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/clients/ClientRequest.java 
dc8f0f115bcda893c95d17c0a57be8d14518d034 
  clients/src/main/java/org/apache/kafka/clients/CommonClientConfigs.java 
7d24c6f5dd2b63b96584f3aa8922a1d048dc1ae4 
  clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java 
15d00d4e484bb5d51a9ae6857ed6e024a2cc1820 
  clients/src/main/java/org/apache/kafka/clients/KafkaClient.java 
f46c0d9b5eb73887c62a0e09c96e9d8c964c709d 
  clients/src/main/java/org/apache/kafka/clients/NetworkClient.java 
049b22eadd5496b70dfcfd9d821f67c62c68a052 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
b9a2d4e2bc565f0ee72b27791afe5c894af262f1 
  clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
73237e455a9e5aa38672522cfd9e5fcdafbcef3b 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkClient.java
 9517d9d0cd480d5ba1d12f1fde7963e60528d2f8 
  clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java 
804d569498396d431880641041fc9292076452cb 
  clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java 
06f00a99a73a288df9afa8c1d4abe3580fa968a6 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java
 4cb1e50d6c4ed55241aeaef1d3af09def5274103 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 a152bd7697dca55609a9ec4cfe0a82c10595fbc3 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordBatch.java
 06182db1c3a5da85648199b4c0c98b80ea7c6c0c 
  clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 
d2e64f7cd8bf56e433a210905b2874f71eee9ea0 
  clients/src/main/java/org/apache/kafka/common/network/Selector.java 
4aa5cbb86ce6e1bf8f6769147ee2a6452c855c74 
  clients/src/test/java/org/apache/kafka/clients/MockClient.java 
e5815f56bdf8e2d980f2bc36b831ed234c0ac781 
  clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java 
69c93c3adf674b1640534c3d7410fcaafaf2232c 
  
clients/src/test/java/org/apache/kafka/clients/producer/internals/BufferPoolTest.java
 2c693824fa53db1e38766b8c66a0ef42ef9d0f3a 
  
clients/src/test/java/org/apache/kafka/clients/producer/internals/RecordAccumulatorTest.java
 5b2e4ffaeab7127648db608c179703b27b577414 
  
clients/src/test/java/org/apache/kafka/clients/producer/internals/SenderTest.java
 aa44991777a855f4b7f4f7bf17107c69393ff8ff 
  clients/src/test/java/org/apache/kafka/test/MockSelector.java 
f83fd9b794a3bd191121a22bcb40fd6ec31d83b5 
  core/src/main/scala/kafka/controller/ControllerChannelManager.scala 
da1cff07f7f76dcfa5a805718febcccd4ed5f578 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
1e8b2331486ffe55bfcc0919e48e12aad23b7d3c 
  core/src/main/scala/kafka/server/KafkaServer.scala 
30406ce809caaac56aca1f30c235b35962d55a50 
  core/src/main/scala/kafka/tools/ProducerPerformance.scala 
46a68e97b8bcc8821f21e4220ce9b3acedc5dafe 
  core/src/main/scala/kafka/utils/NetworkClientBlockingOps.scala 
ad10721de844725f27a116611209992cea61b088 
  core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 
1198df02ddd7727269e84a751ba99520f6d5584a 
  core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
5b4f2db4607ae6d17696c1140f1a771ce75c80e0 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
09b8444c2add87f0f70dbb182e892977a6b5c243 

Diff: https://reviews.apache.org/r/36858/diff/


Testing
---


Thanks,

Mayuresh Gharat



[jira] [Updated] (KAFKA-2120) Add a request timeout to NetworkClient

2015-09-04 Thread Mayuresh Gharat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayuresh Gharat updated KAFKA-2120:
---
Attachment: KAFKA-2120_2015-09-04_17:49:01.patch

> Add a request timeout to NetworkClient
> --
>
> Key: KAFKA-2120
> URL: https://issues.apache.org/jira/browse/KAFKA-2120
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Jiangjie Qin
>Assignee: Mayuresh Gharat
>Priority: Blocker
> Fix For: 0.8.3
>
> Attachments: KAFKA-2120.patch, KAFKA-2120_2015-07-27_15:31:19.patch, 
> KAFKA-2120_2015-07-29_15:57:02.patch, KAFKA-2120_2015-08-10_19:55:18.patch, 
> KAFKA-2120_2015-08-12_10:59:09.patch, KAFKA-2120_2015-09-03_15:12:02.patch, 
> KAFKA-2120_2015-09-04_17:49:01.patch
>
>
> Currently NetworkClient does not have a timeout setting for requests. So if 
> no response is received for a request due to reasons such as broker is down, 
> the request will never be completed.
> Request timeout will also be used as implicit timeout for some methods such 
> as KafkaProducer.flush() and kafkaProducer.close().
> KIP-19 is created for this public interface change.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-19+-+Add+a+request+timeout+to+NetworkClient



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2120) Add a request timeout to NetworkClient

2015-09-04 Thread Mayuresh Gharat (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731670#comment-14731670
 ] 

Mayuresh Gharat commented on KAFKA-2120:


Updated reviewboard https://reviews.apache.org/r/36858/diff/
 against branch origin/trunk

> Add a request timeout to NetworkClient
> --
>
> Key: KAFKA-2120
> URL: https://issues.apache.org/jira/browse/KAFKA-2120
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Jiangjie Qin
>Assignee: Mayuresh Gharat
>Priority: Blocker
> Fix For: 0.8.3
>
> Attachments: KAFKA-2120.patch, KAFKA-2120_2015-07-27_15:31:19.patch, 
> KAFKA-2120_2015-07-29_15:57:02.patch, KAFKA-2120_2015-08-10_19:55:18.patch, 
> KAFKA-2120_2015-08-12_10:59:09.patch, KAFKA-2120_2015-09-03_15:12:02.patch, 
> KAFKA-2120_2015-09-04_17:49:01.patch
>
>
> Currently NetworkClient does not have a timeout setting for requests. So if 
> no response is received for a request due to reasons such as broker is down, 
> the request will never be completed.
> Request timeout will also be used as implicit timeout for some methods such 
> as KafkaProducer.flush() and kafkaProducer.close().
> KIP-19 is created for this public interface change.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-19+-+Add+a+request+timeout+to+NetworkClient



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)