[jira] [Updated] (KAFKA-5416) TransactionCoordinator seems to not return NOT_COORDINATOR error

2017-06-08 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5416:

Priority: Blocker  (was: Major)

> TransactionCoordinator seems to not return NOT_COORDINATOR error
> 
>
> Key: KAFKA-5416
> URL: https://issues.apache.org/jira/browse/KAFKA-5416
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Priority: Blocker
> Fix For: 0.11.0.0
>
>
> In regard to this system test: 
> http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz
> Here are the ownership changes for __transaction_state-37: 
> {noformat}
> ./worker1/debug/server.log:3623:[2017-06-09 01:16:36,551] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16174:[2017-06-09 01:16:39,963] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16194:[2017-06-09 01:16:39,978] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 15 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:35633:[2017-06-09 01:16:45,308] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40353:[2017-06-09 01:16:52,218] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40664:[2017-06-09 01:16:52,683] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146044:[2017-06-09 01:19:08,844] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146048:[2017-06-09 01:19:08,850] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 6 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:150147:[2017-06-09 01:19:10,995] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:3413:[2017-06-09 01:16:36,109] INFO [Transaction 
> Log Manager 1]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:20889:[2017-06-09 01:16:39,991] INFO [Transaction 
> Log Manager 1]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26084:[2017-06-09 01:16:46,682] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26322:[2017-06-09 01:16:47,153] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:27320:[2017-06-09 01:16:50,748] INFO [Transaction 
> Log Manager 1]: Loading transaction m

[jira] [Updated] (KAFKA-5416) TransactionCoordinator seems to not return NOT_COORDINATOR error

2017-06-08 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5416:

Fix Version/s: 0.11.0.0

> TransactionCoordinator seems to not return NOT_COORDINATOR error
> 
>
> Key: KAFKA-5416
> URL: https://issues.apache.org/jira/browse/KAFKA-5416
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
> Fix For: 0.11.0.0
>
>
> In regard to this system test: 
> http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz
> Here are the ownership changes for __transaction_state-37: 
> {noformat}
> ./worker1/debug/server.log:3623:[2017-06-09 01:16:36,551] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16174:[2017-06-09 01:16:39,963] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16194:[2017-06-09 01:16:39,978] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 15 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:35633:[2017-06-09 01:16:45,308] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40353:[2017-06-09 01:16:52,218] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40664:[2017-06-09 01:16:52,683] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146044:[2017-06-09 01:19:08,844] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146048:[2017-06-09 01:19:08,850] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 6 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:150147:[2017-06-09 01:19:10,995] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:3413:[2017-06-09 01:16:36,109] INFO [Transaction 
> Log Manager 1]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:20889:[2017-06-09 01:16:39,991] INFO [Transaction 
> Log Manager 1]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26084:[2017-06-09 01:16:46,682] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26322:[2017-06-09 01:16:47,153] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:27320:[2017-06-09 01:16:50,748] INFO [Transaction 
> Log Manager 1]: Loading transaction metadata from __transaction_state-37 
> 

[jira] [Updated] (KAFKA-5416) TransactionCoordinator seems to not return NOT_COORDINATOR error

2017-06-08 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5416:

Description: 
In regard to this system test: 
http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz

Here are the ownership changes for __transaction_state-37: 
{noformat}
./worker1/debug/server.log:3623:[2017-06-09 01:16:36,551] INFO [Transaction Log 
Manager 2]: Trying to remove cached transaction metadata for 
__transaction_state-37 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:16174:[2017-06-09 01:16:39,963] INFO [Transaction 
Log Manager 2]: Loading transaction metadata from __transaction_state-37 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:16194:[2017-06-09 01:16:39,978] INFO [Transaction 
Log Manager 2]: Finished loading 1 transaction metadata from 
__transaction_state-37 in 15 milliseconds 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:35633:[2017-06-09 01:16:45,308] INFO [Transaction 
Log Manager 2]: Removed 1 cached transaction metadata for 
__transaction_state-37 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:40353:[2017-06-09 01:16:52,218] INFO [Transaction 
Log Manager 2]: Trying to remove cached transaction metadata for 
__transaction_state-37 on follower transition but there is no entries 
remaining; it is likelythat another process for removing the cached entries has 
just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:40664:[2017-06-09 01:16:52,683] INFO [Transaction 
Log Manager 2]: Trying to remove cached transaction metadata for 
__transaction_state-37 on follower transition but there is no entries 
remaining; it is likelythat another process for removing the cached entries has 
just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:146044:[2017-06-09 01:19:08,844] INFO [Transaction 
Log Manager 2]: Loading transaction metadata from __transaction_state-37 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:146048:[2017-06-09 01:19:08,850] INFO [Transaction 
Log Manager 2]: Finished loading 1 transaction metadata from 
__transaction_state-37 in 6 milliseconds 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:150147:[2017-06-09 01:19:10,995] INFO [Transaction 
Log Manager 2]: Removed 1 cached transaction metadata for 
__transaction_state-37 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:3413:[2017-06-09 01:16:36,109] INFO [Transaction Log 
Manager 1]: Loading transaction metadata from __transaction_state-37 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:20889:[2017-06-09 01:16:39,991] INFO [Transaction 
Log Manager 1]: Removed 1 cached transaction metadata for 
__transaction_state-37 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:26084:[2017-06-09 01:16:46,682] INFO [Transaction 
Log Manager 1]: Trying to remove cached transaction metadata for 
__transaction_state-37 on follower transition but there is no entries 
remaining; it is likelythat another process for removing the cached entries has 
just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:26322:[2017-06-09 01:16:47,153] INFO [Transaction 
Log Manager 1]: Trying to remove cached transaction metadata for 
__transaction_state-37 on follower transition but there is no entries 
remaining; it is likelythat another process for removing the cached entries has 
just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:27320:[2017-06-09 01:16:50,748] INFO [Transaction 
Log Manager 1]: Loading transaction metadata from __transaction_state-37 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:27331:[2017-06-09 01:16:50,775] INFO [Transaction 
Log Manager 1]: Finished loading 1 transaction metadata from 
__transaction_state-37 in 27 milliseconds 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:167429:[2017-06-09 01:19:08,857] INFO [Transaction 
Log Manager 1]: Removed 1 cached transaction metadata for 
__transaction_state-37 on follower transition 
(kafka.coordinator.transaction.T

[jira] [Updated] (KAFKA-5416) TransactionCoordinator seems to not return NOT_COORDINATOR error

2017-06-08 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5416:

Description: 
In regard to this system test: 
http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz

Here are the ownership changes for __transaction_state-37: 
{noformat}
./worker1/debug/server.log:3623:[2017-06-09 01:16:36,551] INFO [Transaction Log 
Manager 2]: Trying to remove cached transaction metadata for 
__transaction_state-37 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:16174:[2017-06-09 01:16:39,963] INFO [Transaction 
Log Manager 2]: Loading transaction metadata from __transaction_state-37 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:16194:[2017-06-09 01:16:39,978] INFO [Transaction 
Log Manager 2]: Finished loading 1 transaction metadata from 
__transaction_state-37 in 15 milliseconds 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:35633:[2017-06-09 01:16:45,308] INFO [Transaction 
Log Manager 2]: Removed 1 cached transaction metadata for 
__transaction_state-37 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:40353:[2017-06-09 01:16:52,218] INFO [Transaction 
Log Manager 2]: Trying to remove cached transaction metadata for 
__transaction_state-37 on follower transition but there is no entries 
remaining; it is likelythat another process for removing the cached entries has 
just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:40664:[2017-06-09 01:16:52,683] INFO [Transaction 
Log Manager 2]: Trying to remove cached transaction metadata for 
__transaction_state-37 on follower transition but there is no entries 
remaining; it is likelythat another process for removing the cached entries has 
just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:146044:[2017-06-09 01:19:08,844] INFO [Transaction 
Log Manager 2]: Loading transaction metadata from __transaction_state-37 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:146048:[2017-06-09 01:19:08,850] INFO [Transaction 
Log Manager 2]: Finished loading 1 transaction metadata from 
__transaction_state-37 in 6 milliseconds 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:150147:[2017-06-09 01:19:10,995] INFO [Transaction 
Log Manager 2]: Removed 1 cached transaction metadata for 
__transaction_state-37 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:3413:[2017-06-09 01:16:36,109] INFO [Transaction Log 
Manager 1]: Loading transaction metadata from __transaction_state-37 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:20889:[2017-06-09 01:16:39,991] INFO [Transaction 
Log Manager 1]: Removed 1 cached transaction metadata for 
__transaction_state-37 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:26084:[2017-06-09 01:16:46,682] INFO [Transaction 
Log Manager 1]: Trying to remove cached transaction metadata for 
__transaction_state-37 on follower transition but there is no entries 
remaining; it is likelythat another process for removing the cached entries has 
just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:26322:[2017-06-09 01:16:47,153] INFO [Transaction 
Log Manager 1]: Trying to remove cached transaction metadata for 
__transaction_state-37 on follower transition but there is no entries 
remaining; it is likelythat another process for removing the cached entries has 
just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:27320:[2017-06-09 01:16:50,748] INFO [Transaction 
Log Manager 1]: Loading transaction metadata from __transaction_state-37 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:27331:[2017-06-09 01:16:50,775] INFO [Transaction 
Log Manager 1]: Finished loading 1 transaction metadata from 
__transaction_state-37 in 27 milliseconds 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:167429:[2017-06-09 01:19:08,857] INFO [Transaction 
Log Manager 1]: Removed 1 cached transaction metadata for 
__transaction_state-37 on follower transition 
(kafka.coordinator.transaction.T

[jira] [Commented] (KAFKA-5416) TransactionCoordinator seems to not return NOT_COORDINATOR error

2017-06-08 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044045#comment-16044045
 ] 

Apurva Mehta commented on KAFKA-5416:
-

cc [~guozhang] [~damianguy] 

A second pair of eyes making sense of these logs would help a lot.

> TransactionCoordinator seems to not return NOT_COORDINATOR error
> 
>
> Key: KAFKA-5416
> URL: https://issues.apache.org/jira/browse/KAFKA-5416
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>
> In regard to this system test: 
> http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz
> There are two issues:
> First, a coordinator who is not the owner for a given partition of the 
> transaction log does not return NOT_COORDINATOR, but CONCURRENT_TRANSACTIONS 
> instead.
> Here are the ownership changes for __transaction_state-41: 
> {noformat}
> ./worker1/debug/server.log:3559:[2017-06-09 01:16:36,244] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-41 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:38471:[2017-06-09 01:16:45,910] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-41 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40226:[2017-06-09 01:16:51,821] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-41 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:41233:[2017-06-09 01:16:53,332] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-41 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:42486:[2017-06-09 01:16:59,584] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-41 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:42515:[2017-06-09 01:16:59,611] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-41 in 27 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:153029:[2017-06-09 01:19:11,484] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-41 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:3537:[2017-06-09 01:16:36,441] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-41 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:25957:[2017-06-09 01:16:46,309] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-41 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26618:[2017-06-09 01:16:48,164] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-41 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:27951:[2017-06-09 01:16:51,398] INFO [Transaction 
> Log Manager 1]: Loading transaction metadata from __transaction_state-41 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:27958:[2017-06-09 01:16:51,407] INFO [Transaction 
> Log Manager 1]: Finished loading 1 transaction metadata from 
> __transaction_state-41 in 9 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:80970:[2017-06-09 01:16:59,608] INFO [Transaction 
> Log Manager 1]: Removed 1 cached transac

[jira] [Updated] (KAFKA-5416) TransactionCoordinator seems to not return NOT_COORDINATOR error

2017-06-08 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5416:

Description: 
In regard to this system test: 
http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz

There are two issues:
First, a coordinator who is not the owner for a given partition of the 
transaction log does not return NOT_COORDINATOR, but CONCURRENT_TRANSACTIONS 
instead.

Here are the ownership changes for __transaction_state-41: 
{noformat}
./worker1/debug/server.log:3559:[2017-06-09 01:16:36,244] INFO [Transaction Log 
Manager 2]: Loading transaction metadata from __transaction_state-41 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:38471:[2017-06-09 01:16:45,910] INFO [Transaction 
Log Manager 2]: Removed 1 cached transaction metadata for 
__transaction_state-41 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:40226:[2017-06-09 01:16:51,821] INFO [Transaction 
Log Manager 2]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:41233:[2017-06-09 01:16:53,332] INFO [Transaction 
Log Manager 2]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:42486:[2017-06-09 01:16:59,584] INFO [Transaction 
Log Manager 2]: Loading transaction metadata from __transaction_state-41 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:42515:[2017-06-09 01:16:59,611] INFO [Transaction 
Log Manager 2]: Finished loading 1 transaction metadata from 
__transaction_state-41 in 27 milliseconds 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:153029:[2017-06-09 01:19:11,484] INFO [Transaction 
Log Manager 2]: Removed 1 cached transaction metadata for 
__transaction_state-41 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:3537:[2017-06-09 01:16:36,441] INFO [Transaction Log 
Manager 1]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:25957:[2017-06-09 01:16:46,309] INFO [Transaction 
Log Manager 1]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:26618:[2017-06-09 01:16:48,164] INFO [Transaction 
Log Manager 1]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:27951:[2017-06-09 01:16:51,398] INFO [Transaction 
Log Manager 1]: Loading transaction metadata from __transaction_state-41 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:27958:[2017-06-09 01:16:51,407] INFO [Transaction 
Log Manager 1]: Finished loading 1 transaction metadata from 
__transaction_state-41 in 9 milliseconds 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:80970:[2017-06-09 01:16:59,608] INFO [Transaction 
Log Manager 1]: Removed 1 cached transaction metadata for 
__transaction_state-41 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker7/debug/server.log:2626:[2017-06-09 01:16:36,882] INFO [Transaction Log 
Manager 3]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker7/debug/server.log:14291:[2017-06-09 01:16:45,909] INFO [Transaction 
Log Manager 3]: Load

[jira] [Created] (KAFKA-5416) TransactionCoordinator seems to not return NOT_COORDINATOR error

2017-06-08 Thread Apurva Mehta (JIRA)
Apurva Mehta created KAFKA-5416:
---

 Summary: TransactionCoordinator seems to not return 
NOT_COORDINATOR error
 Key: KAFKA-5416
 URL: https://issues.apache.org/jira/browse/KAFKA-5416
 Project: Kafka
  Issue Type: Bug
Reporter: Apurva Mehta


In regard to this system test: 
http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz

There are two issues:
# A coordinator who is not the owner for a given partition of the transaction 
log does not return NOT_COORDINATOR, but CONCURRENT_TRANSACTIONS instead.

Here are the ownership changes for __transaction_state-41: 
{noformat}
./worker1/debug/server.log:3559:[2017-06-09 01:16:36,244] INFO [Transaction Log 
Manager 2]: Loading transaction metadata from __transaction_state-41 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:38471:[2017-06-09 01:16:45,910] INFO [Transaction 
Log Manager 2]: Removed 1 cached transaction metadata for 
__transaction_state-41 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:40226:[2017-06-09 01:16:51,821] INFO [Transaction 
Log Manager 2]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:41233:[2017-06-09 01:16:53,332] INFO [Transaction 
Log Manager 2]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:42486:[2017-06-09 01:16:59,584] INFO [Transaction 
Log Manager 2]: Loading transaction metadata from __transaction_state-41 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:42515:[2017-06-09 01:16:59,611] INFO [Transaction 
Log Manager 2]: Finished loading 1 transaction metadata from 
__transaction_state-41 in 27 milliseconds 
(kafka.coordinator.transaction.TransactionStateManager)
./worker1/debug/server.log:153029:[2017-06-09 01:19:11,484] INFO [Transaction 
Log Manager 2]: Removed 1 cached transaction metadata for 
__transaction_state-41 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:3537:[2017-06-09 01:16:36,441] INFO [Transaction Log 
Manager 1]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:25957:[2017-06-09 01:16:46,309] INFO [Transaction 
Log Manager 1]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:26618:[2017-06-09 01:16:48,164] INFO [Transaction 
Log Manager 1]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:27951:[2017-06-09 01:16:51,398] INFO [Transaction 
Log Manager 1]: Loading transaction metadata from __transaction_state-41 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:27958:[2017-06-09 01:16:51,407] INFO [Transaction 
Log Manager 1]: Finished loading 1 transaction metadata from 
__transaction_state-41 in 9 milliseconds 
(kafka.coordinator.transaction.TransactionStateManager)
./worker2/debug/server.log:80970:[2017-06-09 01:16:59,608] INFO [Transaction 
Log Manager 1]: Removed 1 cached transaction metadata for 
__transaction_state-41 on follower transition 
(kafka.coordinator.transaction.TransactionStateManager)
./worker7/debug/server.log:2626:[2017-06-09 01:16:36,882] INFO [Transaction Log 
Manager 3]: Trying to remove cached transaction metadata for 
__transaction_state-41 on follower transition but there is no entries 
remaining; it is likely that another process for removing the cached entries 
has just executed earlier before 
(kafka.coordinator.

[jira] [Commented] (KAFKA-3925) Default log.dir=/tmp/kafka-logs is unsafe

2017-06-08 Thread Paolo Patierno (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044002#comment-16044002
 ] 

Paolo Patierno commented on KAFKA-3925:
---

In my opinion, putting the log in /tmp is great for all developers who start to 
use Kafka or for testing as well. They don't need to remember to delete such 
logs after "playing" around with Kafka. I think that as default is good. If 
people need persistence of such logs, they should change the logs dir. I think 
that just warning on startup could be enough. 

> Default log.dir=/tmp/kafka-logs is unsafe
> -
>
> Key: KAFKA-3925
> URL: https://issues.apache.org/jira/browse/KAFKA-3925
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 0.10.0.0
> Environment: Various, depends on OS and configuration
>Reporter: Peter Davis
>
> Many operating systems are configured to delete files under /tmp.  For 
> example Ubuntu has 
> [tmpreaper|http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html], 
> others use tmpfs, others delete /tmp on startup. 
> Defaults are OK to make getting started easier but should not be unsafe 
> (risking data loss). 
> Something under /var would be a better default log.dir under *nix.  Or 
> relative to the Kafka bin directory to avoid needing root.  
> If the default cannot be changed, I would suggest a special warning print to 
> the console on broker startup if log.dir is under /tmp. 
> See [users list 
> thread|http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e].
>   I've also been bitten by this. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5412) Using connect-console-sink/source.properties raises an exception related to "file" property not found

2017-06-08 Thread Paolo Patierno (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043988#comment-16043988
 ] 

Paolo Patierno commented on KAFKA-5412:
---

Thanks [~rhauch] ! I'll do the PR ...
Btw I already sent the request for being part of the contributors list a couple 
of days ago but no response right now :-(

> Using connect-console-sink/source.properties raises an exception related to 
> "file" property not found
> -
>
> Key: KAFKA-5412
> URL: https://issues.apache.org/jira/browse/KAFKA-5412
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Paolo Patierno
>
> Hi,
> with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
> using connect-console-sink/source.properties doesn't work anymore because the 
> needed "file" property isn't found.
> This is because the underlying used FileStreamSink/Source connector and task 
> has defined a ConfigDef with "file" as mandatory parameter. In the case of 
> console example we want to have file=null so that stdin and stdout are used.
> One possible solution is set "file=" inside the provided 
> connect-console-sink/source.properties.
> The other one could be modify the FileStreamSink/Source source code in order 
> to remove the "file" definition from the ConfigDef.
> What do you think ?
> I can provide a PR for that.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5412) Using connect-console-sink/source.properties raises an exception related to "file" property not found

2017-06-08 Thread Randall Hauch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043986#comment-16043986
 ] 

Randall Hauch commented on KAFKA-5412:
--

Please fix both the FileStreamSink and FileStreamSource. BTW, while you're 
making changes, it might also be good to expand the description of the "file" 
{{ConfigDef}} to say what the value should represent and what it means to not 
set it. Thanks!

> Using connect-console-sink/source.properties raises an exception related to 
> "file" property not found
> -
>
> Key: KAFKA-5412
> URL: https://issues.apache.org/jira/browse/KAFKA-5412
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Paolo Patierno
>
> Hi,
> with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
> using connect-console-sink/source.properties doesn't work anymore because the 
> needed "file" property isn't found.
> This is because the underlying used FileStreamSink/Source connector and task 
> has defined a ConfigDef with "file" as mandatory parameter. In the case of 
> console example we want to have file=null so that stdin and stdout are used.
> One possible solution is set "file=" inside the provided 
> connect-console-sink/source.properties.
> The other one could be modify the FileStreamSink/Source source code in order 
> to remove the "file" definition from the ConfigDef.
> What do you think ?
> I can provide a PR for that.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5412) Using connect-console-sink/source.properties raises an exception related to "file" property not found

2017-06-08 Thread Randall Hauch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043984#comment-16043984
 ] 

Randall Hauch commented on KAFKA-5412:
--

[~ppatierno], thanks! Go ahead and submit a pull request with a description 
that begins with "KAFKA-5412", and see https://kafka.apache.org/contributing 
for details. That page suggests you email "dev@kafka.apache.org" and request to 
be added to the contributors list, at which point you can assign KAFKA issues 
to yourself.

> Using connect-console-sink/source.properties raises an exception related to 
> "file" property not found
> -
>
> Key: KAFKA-5412
> URL: https://issues.apache.org/jira/browse/KAFKA-5412
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Paolo Patierno
>
> Hi,
> with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
> using connect-console-sink/source.properties doesn't work anymore because the 
> needed "file" property isn't found.
> This is because the underlying used FileStreamSink/Source connector and task 
> has defined a ConfigDef with "file" as mandatory parameter. In the case of 
> console example we want to have file=null so that stdin and stdout are used.
> One possible solution is set "file=" inside the provided 
> connect-console-sink/source.properties.
> The other one could be modify the FileStreamSink/Source source code in order 
> to remove the "file" definition from the ConfigDef.
> What do you think ?
> I can provide a PR for that.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-3199) LoginManager should allow using an existing Subject

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043891#comment-16043891
 ] 

ASF GitHub Bot commented on KAFKA-3199:
---

Github user kunickiaj closed the pull request at:

https://github.com/apache/kafka/pull/862


> LoginManager should allow using an existing Subject
> ---
>
> Key: KAFKA-3199
> URL: https://issues.apache.org/jira/browse/KAFKA-3199
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Adam Kunicki
>Assignee: Adam Kunicki
>Priority: Critical
>
> LoginManager currently creates a new Login in the constructor which then 
> performs a login and starts a ticket renewal thread. The problem here is that 
> because Kafka performs its own login, it doesn't offer the ability to re-use 
> an existing subject that's already managed by the client application.
> The goal of LoginManager appears to be to be able to return a valid Subject. 
> It would be a simple fix to have LoginManager.acquireLoginManager() check for 
> a new config e.g. kerberos.use.existing.subject. 
> This would instead of creating a new Login in the constructor simply call 
> Subject.getSubject(AccessController.getContext()); to use the already logged 
> in Subject.
> This is also doable without introducing a new configuration and simply 
> checking if there is already a valid Subject available, but I think it may be 
> preferable to require that users explicitly request this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #862: KAFKA-3199: LoginManager should allow using an exis...

2017-06-08 Thread kunickiaj
Github user kunickiaj closed the pull request at:

https://github.com/apache/kafka/pull/862


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-trunk-jdk7 #2375

2017-06-08 Thread Apache Jenkins Server
See 




[jira] [Commented] (KAFKA-3450) Producer blocks on send to topic that doesn't exist if auto create is disabled

2017-06-08 Thread Soumya Bhaumik (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043829#comment-16043829
 ] 

Soumya Bhaumik commented on KAFKA-3450:
---

Just checking to see if there is any update on this.

I am using producer for Kafka 2.11.
Apart from the issue noted here, I find that producer.send() becomes blocking 
when the Kafka server at specified host is unavailable.

> Producer blocks on send to topic that doesn't exist if auto create is disabled
> --
>
> Key: KAFKA-3450
> URL: https://issues.apache.org/jira/browse/KAFKA-3450
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.1
>Reporter: Michal Turek
>Priority: Critical
>
> {{producer.send()}} is blocked for {{max.block.ms}} (default 60 seconds) if 
> the destination topic doesn't exist and if their automatic creation is 
> disabled. Warning from NetworkClient containing UNKNOWN_TOPIC_OR_PARTITION is 
> logged every 100 ms in a loop until the 60 seconds timeout expires, but the 
> operation is not recoverable.
> Preconditions
> - Kafka 0.9.0.1 with default configuration and auto.create.topics.enable=false
> - Kafka 0.9.0.1 clients.
> Example minimalist code
> https://github.com/avast/kafka-tests/blob/master/src/main/java/com/avast/kafkatests/othertests/nosuchtopic/NoSuchTopicTest.java
> {noformat}
> /**
>  * Test of sending to a topic that does not exist while automatic creation of 
> topics is disabled in Kafka (auto.create.topics.enable=false).
>  */
> public class NoSuchTopicTest {
> private static final Logger LOGGER = 
> LoggerFactory.getLogger(NoSuchTopicTest.class);
> public static void main(String[] args) {
> Properties properties = new Properties();
> properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, 
> "localhost:9092");
> properties.setProperty(ProducerConfig.CLIENT_ID_CONFIG, 
> NoSuchTopicTest.class.getSimpleName());
> properties.setProperty(ProducerConfig.MAX_BLOCK_MS_CONFIG, "1000"); 
> // Default is 60 seconds
> try (Producer producer = new 
> KafkaProducer<>(properties, new StringSerializer(), new StringSerializer())) {
> LOGGER.info("Sending message");
> producer.send(new ProducerRecord<>("ThisTopicDoesNotExist", 
> "key", "value"), (metadata, exception) -> {
> if (exception != null) {
> LOGGER.error("Send failed: {}", exception.toString());
> } else {
> LOGGER.info("Send successful: {}-{}/{}", 
> metadata.topic(), metadata.partition(), metadata.offset());
> }
> });
> LOGGER.info("Sending message");
> producer.send(new ProducerRecord<>("ThisTopicDoesNotExistToo", 
> "key", "value"), (metadata, exception) -> {
> if (exception != null) {
> LOGGER.error("Send failed: {}", exception.toString());
> } else {
> LOGGER.info("Send successful: {}-{}/{}", 
> metadata.topic(), metadata.partition(), metadata.offset());
> }
> });
> }
> }
> }
> {noformat}
> Related output
> {noformat}
> 2016-03-23 12:44:37.725 INFO  c.a.k.o.nosuchtopic.NoSuchTopicTest [main]: 
> Sending message (NoSuchTopicTest.java:26)
> 2016-03-23 12:44:37.830 WARN  o.a.kafka.clients.NetworkClient 
> [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching 
> metadata with correlation id 0 : 
> {ThisTopicDoesNotExist=UNKNOWN_TOPIC_OR_PARTITION} (NetworkClient.java:582)
> 2016-03-23 12:44:37.928 WARN  o.a.kafka.clients.NetworkClient 
> [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching 
> metadata with correlation id 1 : 
> {ThisTopicDoesNotExist=UNKNOWN_TOPIC_OR_PARTITION} (NetworkClient.java:582)
> 2016-03-23 12:44:38.028 WARN  o.a.kafka.clients.NetworkClient 
> [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching 
> metadata with correlation id 2 : 
> {ThisTopicDoesNotExist=UNKNOWN_TOPIC_OR_PARTITION} (NetworkClient.java:582)
> 2016-03-23 12:44:38.130 WARN  o.a.kafka.clients.NetworkClient 
> [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching 
> metadata with correlation id 3 : 
> {ThisTopicDoesNotExist=UNKNOWN_TOPIC_OR_PARTITION} (NetworkClient.java:582)
> 2016-03-23 12:44:38.231 WARN  o.a.kafka.clients.NetworkClient 
> [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching 
> metadata with correlation id 4 : 
> {ThisTopicDoesNotExist=UNKNOWN_TOPIC_OR_PARTITION} (NetworkClient.java:582)
> 2016-03-23 12:44:38.332 WARN  o.a.kafka.clients.NetworkClient 
> [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching 
> metadata with correl

[jira] [Reopened] (KAFKA-5327) Console Consumer should only poll for up to max messages

2017-06-08 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reopened KAFKA-5327:


I have reverted this change in KAFKA-5414 since it breaks existing usage. It 
seems like a simpler option is just to ensure that max.poll.records is not set 
higher than max messages.

> Console Consumer should only poll for up to max messages
> 
>
> Key: KAFKA-5327
> URL: https://issues.apache.org/jira/browse/KAFKA-5327
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Dustin Cote
>Assignee: huxihx
>Priority: Minor
> Fix For: 0.11.0.0
>
>
> The ConsoleConsumer has a --max-messages flag that can be used to limit the 
> number of messages consumed. However, the number of records actually consumed 
> is governed by max.poll.records. This means you see one message on the 
> console, but your offset has moved forward a default of 500, which is kind of 
> counterintuitive. It would be good to only commit offsets for messages we 
> have printed to the console.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5414) Console consumer offset commit regression

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043786#comment-16043786
 ] 

ASF GitHub Bot commented on KAFKA-5414:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3277


> Console consumer offset commit regression
> -
>
> Key: KAFKA-5414
> URL: https://issues.apache.org/jira/browse/KAFKA-5414
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.11.0.0
>
>
> In KAFKA-5327, the behavior of console consumer was changed to only commit 
> offsets when the process closes. Previously we used periodic offset commits. 
> This breaks existing usage in system tests and probably elsewhere.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3277: KAFKA-5414: Revert "KAFKA-5327: ConsoleConsumer sh...

2017-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3277


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-5414) Console consumer offset commit regression

2017-06-08 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-5414.

Resolution: Fixed

Issue resolved by pull request 3277
[https://github.com/apache/kafka/pull/3277]

> Console consumer offset commit regression
> -
>
> Key: KAFKA-5414
> URL: https://issues.apache.org/jira/browse/KAFKA-5414
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.11.0.0
>
>
> In KAFKA-5327, the behavior of console consumer was changed to only commit 
> offsets when the process closes. Previously we used periodic offset commits. 
> This breaks existing usage in system tests and probably elsewhere.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-5152) Kafka Streams keeps restoring state after shutdown is initiated during startup

2017-06-08 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-5152:
--

Assignee: Matthias J. Sax

> Kafka Streams keeps restoring state after shutdown is initiated during startup
> --
>
> Key: KAFKA-5152
> URL: https://issues.apache.org/jira/browse/KAFKA-5152
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Xavier Léauté
>Assignee: Matthias J. Sax
> Fix For: 0.10.2.2, 0.11.0.1
>
>
> If streams shutdown is initiated during state restore (e.g. an uncaught 
> exception is thrown) streams will not shut down until all stores are first 
> finished restoring.
> As restore progresses, stream threads appear to be taken out of service as 
> part of the shutdown sequence, causing rebalancing of tasks. This compounds 
> the problem by slowing down the restore process even further, since the 
> remaining threads now have to also restore the reassigned tasks before they 
> can shut down.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5152) Kafka Streams keeps restoring state after shutdown is initiated during startup

2017-06-08 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-5152:
---
Fix Version/s: 0.11.0.1
   0.10.2.2

> Kafka Streams keeps restoring state after shutdown is initiated during startup
> --
>
> Key: KAFKA-5152
> URL: https://issues.apache.org/jira/browse/KAFKA-5152
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Xavier Léauté
>Assignee: Matthias J. Sax
> Fix For: 0.10.2.2, 0.11.0.1
>
>
> If streams shutdown is initiated during state restore (e.g. an uncaught 
> exception is thrown) streams will not shut down until all stores are first 
> finished restoring.
> As restore progresses, stream threads appear to be taken out of service as 
> part of the shutdown sequence, causing rebalancing of tasks. This compounds 
> the problem by slowing down the restore process even further, since the 
> remaining threads now have to also restore the reassigned tasks before they 
> can shut down.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommit state

2017-06-08 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043758#comment-16043758
 ] 

Apurva Mehta commented on KAFKA-5415:
-

The interesting thing is that it reproduces consistently on jenkins only when 
it is part of a multi test run. Running this test in isolation makes it look 
stable.

> TransactionCoordinator gets stuck in PrepareCommit state
> 
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5242) add max_number _of_retries to exponential backoff strategy

2017-06-08 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043753#comment-16043753
 ] 

Matthias J. Sax commented on KAFKA-5242:


While trying to add a new integration test for Streams EOS, I encountered a 
similar issue that prohibited the test design. In order to make the test work, 
I needed to use two instances with one thread each and different state 
directories to isolate both, instead of using one instance with 2 thread (cf. 
{{EosIntegrationTest#shouldNotViolateEosIfOneTaskGetsFencedUsingIsolatedAppInstances()}}.

In order to fix this properly, we should extract state restoration from the 
rebalance callback functions and embed task creation and state restoration 
within the mail run-loop and also call poll() regularly if we cannot get a lock 
for a task and during store restoration (right now, for big state, we might 
miss a rebalance -- but we should actually try to participate in the rebalance 
and "abort" state restoration if a test is migrated away during rebalance)

> add max_number _of_retries to exponential backoff strategy
> --
>
> Key: KAFKA-5242
> URL: https://issues.apache.org/jira/browse/KAFKA-5242
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Lukas Gemela
>Assignee: Matthias J. Sax
>Priority: Minor
> Fix For: 0.10.2.2, 0.11.0.1
>
> Attachments: clio_170511.log
>
>
> From time to time, during relabance we are getting a lot of exceptions saying 
> {code}
> org.apache.kafka.streams.errors.LockException: task [0_0] Failed to lock the 
> state directory: /app/db/clio/0_0
>   at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:102)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:834)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1207)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1180)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:937)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$500(StreamThread.java:69)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:236)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:255)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:339)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) 
> [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
>  [kafka-streams-0.10.2.0.jar!/:?]
> {code}
> (see attached logfile)
> It was actually problem on our side - we ran startStreams() twice and 
> therefore we had two threads touching the same folder structure. 
> But what I've noticed, the backoff strategy in 
> StreamThread$AbstractTaskCreator.retryWithBackoff can run endlessly - after 
> 20 iterations it takes 6hours until the next attempt to start a task. 
> I've noticed latest code contains check for rebalanceTimeoutMs, but that 
> still does not solve the problem especially in case 
> MAX_POLL_INTERVAL_MS_CONFIG is set to Integer.MAX_INT. at this stage kafka 
> streams just hangs up indefinitely.
> I w

[jira] [Comment Edited] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommit state

2017-06-08 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043741#comment-16043741
 ] 

Apurva Mehta edited comment on KAFKA-5415 at 6/9/17 12:59 AM:
--

Incriminating evidence from the logs.

1. Tail of the transaction log is PrepareCommit at offset 277
{noformat}
offset: 276 position: 46886 CreateTime: 1496957141444 isvalid: true keysize: 29 
valuesize: 95 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 
isTransactional: false headerKeys: [] key: 
transactionalId=my-first-transactional-id payload: 
producerId:2000,producerEpoch:0,state=Ongoing,partitions=Set(output-topic-2, 
__consumer_offsets-47, output-topic-0, 
output-topic-1),lastUpdateTimestamp=1496957141444
offset: 277 position: 47080 CreateTime: 1496957141285 isvalid: true keysize: 29 
valuesize: 95 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 
isTransactional: false headerKeys: [] key: 
transactionalId=my-first-transactional-id payload: 
producerId:2000,producerEpoch:0,state=PrepareCommit,partitions=Set(output-topic-2,
 __consumer_offsets-47, output-topic-0, 
output-topic-1),lastUpdateTimestamp=1496957141285
{noformat}

2. The client disconnected from the coordinator after sending the EndTxn 
request, and got 'CONCURRENT_TRANSACTIONS' on retry.
{noformat}
[2017-06-08 21:25:41,474] TRACE Produced messages to topic-partition 
output-topic-2 with base offset offset 28618 and error: null. 
(org.apache.kafka.clients.producer.internals.ProducerBatch)
[2017-06-08 21:25:41,474] TRACE [TransactionalId my-first-transactional-id] 
Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:25:41,474] DEBUG [TransactionalId my-first-transactional-id] 
Sending transactional request (type=EndTxnRequest, 
transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, 
result=COMMIT) to node worker12:9092 (id: 3 rack: null) 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Disconnected from 3. Will retry. 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=EndTxnRequest, 
transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, 
result=COMMIT) (org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] TRACE [TransactionalId my-first-transactional-id] 
Request (type=FindCoordinatorRequest, coordinatorKey=my-first-transactional-id, 
coordinatorType=TRANSACTION) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,534] DEBUG [TransactionalId my-first-transactional-id] 
Sending transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) to node 
worker5:9092 (id: 2 rack: null) 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,535] TRACE [TransactionalId my-first-transactional-id] 
Received transactional response FindCoordinatorResponse(throttleTimeMs=0, 
errorMessage='null', error=NONE, node=worker12:9092 (id: 3 rack: null)) for 
request (type=FindCoordinatorRequest, coordinatorKey=my-first-transactional-id, 
coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] TRACE [TransactionalId my-first-transactional-id] 
Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Disconnect from worker12:9092 (id: 3 rack: null) while trying to send request 
(type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT). Going to back off and retry 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=EndTxnRequest, 
transactionalId=my-first-tra

[jira] [Updated] (KAFKA-5242) add max_number _of_retries to exponential backoff strategy

2017-06-08 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-5242:
---
Fix Version/s: 0.11.0.1
   0.10.2.2

> add max_number _of_retries to exponential backoff strategy
> --
>
> Key: KAFKA-5242
> URL: https://issues.apache.org/jira/browse/KAFKA-5242
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Lukas Gemela
>Assignee: Matthias J. Sax
>Priority: Minor
> Fix For: 0.10.2.2, 0.11.0.1
>
> Attachments: clio_170511.log
>
>
> From time to time, during relabance we are getting a lot of exceptions saying 
> {code}
> org.apache.kafka.streams.errors.LockException: task [0_0] Failed to lock the 
> state directory: /app/db/clio/0_0
>   at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:102)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:834)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1207)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1180)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:937)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$500(StreamThread.java:69)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:236)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:255)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:339)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) 
> [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
>  [kafka-streams-0.10.2.0.jar!/:?]
> {code}
> (see attached logfile)
> It was actually problem on our side - we ran startStreams() twice and 
> therefore we had two threads touching the same folder structure. 
> But what I've noticed, the backoff strategy in 
> StreamThread$AbstractTaskCreator.retryWithBackoff can run endlessly - after 
> 20 iterations it takes 6hours until the next attempt to start a task. 
> I've noticed latest code contains check for rebalanceTimeoutMs, but that 
> still does not solve the problem especially in case 
> MAX_POLL_INTERVAL_MS_CONFIG is set to Integer.MAX_INT. at this stage kafka 
> streams just hangs up indefinitely.
> I would personally make that backoffstrategy a bit more configurable with a 
> number of retries that if it exceed a configured value it propagates the 
> exception as any other exception to custom client exception handler.
> (I can provide a patch)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-5242) add max_number _of_retries to exponential backoff strategy

2017-06-08 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-5242:
--

Assignee: Matthias J. Sax

> add max_number _of_retries to exponential backoff strategy
> --
>
> Key: KAFKA-5242
> URL: https://issues.apache.org/jira/browse/KAFKA-5242
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Lukas Gemela
>Assignee: Matthias J. Sax
>Priority: Minor
> Attachments: clio_170511.log
>
>
> From time to time, during relabance we are getting a lot of exceptions saying 
> {code}
> org.apache.kafka.streams.errors.LockException: task [0_0] Failed to lock the 
> state directory: /app/db/clio/0_0
>   at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:102)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:834)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1207)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1180)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:937)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$500(StreamThread.java:69)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:236)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:255)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:339)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030)
>  [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) 
> [kafka-clients-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582)
>  [kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
>  [kafka-streams-0.10.2.0.jar!/:?]
> {code}
> (see attached logfile)
> It was actually problem on our side - we ran startStreams() twice and 
> therefore we had two threads touching the same folder structure. 
> But what I've noticed, the backoff strategy in 
> StreamThread$AbstractTaskCreator.retryWithBackoff can run endlessly - after 
> 20 iterations it takes 6hours until the next attempt to start a task. 
> I've noticed latest code contains check for rebalanceTimeoutMs, but that 
> still does not solve the problem especially in case 
> MAX_POLL_INTERVAL_MS_CONFIG is set to Integer.MAX_INT. at this stage kafka 
> streams just hangs up indefinitely.
> I would personally make that backoffstrategy a bit more configurable with a 
> number of retries that if it exceed a configured value it propagates the 
> exception as any other exception to custom client exception handler.
> (I can provide a patch)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommit state

2017-06-08 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5415:

Summary: TransactionCoordinator gets stuck in PrepareCommit state  (was: 
TransactionCoordinator gets stuck in PrepareCommits state.)

> TransactionCoordinator gets stuck in PrepareCommit state
> 
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommits state.

2017-06-08 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043741#comment-16043741
 ] 

Apurva Mehta edited comment on KAFKA-5415 at 6/9/17 12:50 AM:
--

Incriminating evidence from the logs.

1. Tail of the transaction log is PrepareCommit at offset 277
{noformat}
offset: 276 position: 46886 CreateTime: 1496957141444 isvalid: true keysize: 29 
valuesize: 95 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 
isTransactional: false headerKeys: [] key: 
transactionalId=my-first-transactional-id payload: 
producerId:2000,producerEpoch:0,state=Ongoing,partitions=Set(output-topic-2, 
__consumer_offsets-47, output-topic-0, 
output-topic-1),lastUpdateTimestamp=1496957141444
offset: 277 position: 47080 CreateTime: 1496957141285 isvalid: true keysize: 29 
valuesize: 95 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 
isTransactional: false headerKeys: [] key: 
transactionalId=my-first-transactional-id payload: 
producerId:2000,producerEpoch:0,state=PrepareCommit,partitions=Set(output-topic-2,
 __consumer_offsets-47, output-topic-0, 
output-topic-1),lastUpdateTimestamp=1496957141285
{noformat}

2. The client disconnected from the coordinator after sending the EndTxn 
request, and got 'CONCURRENT_TRANSACTIONS' on retry.
{noformat}
[2017-06-08 21:25:41,474] TRACE Produced messages to topic-partition 
output-topic-2 with base offset offset 28618 and error: null. 
(org.apache.kafka.clients.producer.internals.ProducerBatch)
[2017-06-08 21:25:41,474] TRACE [TransactionalId my-first-transactional-id] 
Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:25:41,474] DEBUG [TransactionalId my-first-transactional-id] 
Sending transactional request (type=EndTxnRequest, 
transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, 
result=COMMIT) to node worker12:9092 (id: 3 rack: null) 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Disconnected from 3. Will retry. 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=EndTxnRequest, 
transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, 
result=COMMIT) (org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] TRACE [TransactionalId my-first-transactional-id] 
Request (type=FindCoordinatorRequest, coordinatorKey=my-first-transactional-id, 
coordinatorType=TRANSACTION) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,534] DEBUG [TransactionalId my-first-transactional-id] 
Sending transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) to node 
worker5:9092 (id: 2 rack: null) 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,535] TRACE [TransactionalId my-first-transactional-id] 
Received transactional response FindCoordinatorResponse(throttleTimeMs=0, 
errorMessage='null', error=NONE, node=worker12:9092 (id: 3 rack: null)) for 
request (type=FindCoordinatorRequest, coordinatorKey=my-first-transactional-id, 
coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] TRACE [TransactionalId my-first-transactional-id] 
Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Disconnect from worker12:9092 (id: 3 rack: null) while trying to send request 
(type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT). Going to back off and retry 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=EndTxnRequest, 
transactionalId=my-first-tra

[jira] [Comment Edited] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommits state.

2017-06-08 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043741#comment-16043741
 ] 

Apurva Mehta edited comment on KAFKA-5415 at 6/9/17 12:51 AM:
--

Incriminating evidence from the logs.

1. Tail of the transaction log is PrepareCommit at offset 277
{noformat}
offset: 276 position: 46886 CreateTime: 1496957141444 isvalid: true keysize: 29 
valuesize: 95 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 
isTransactional: false headerKeys: [] key: 
transactionalId=my-first-transactional-id payload: 
producerId:2000,producerEpoch:0,state=Ongoing,partitions=Set(output-topic-2, 
__consumer_offsets-47, output-topic-0, 
output-topic-1),lastUpdateTimestamp=1496957141444
offset: 277 position: 47080 CreateTime: 1496957141285 isvalid: true keysize: 29 
valuesize: 95 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 
isTransactional: false headerKeys: [] key: 
transactionalId=my-first-transactional-id payload: 
producerId:2000,producerEpoch:0,state=PrepareCommit,partitions=Set(output-topic-2,
 __consumer_offsets-47, output-topic-0, 
output-topic-1),lastUpdateTimestamp=1496957141285
{noformat}

2. The client disconnected from the coordinator after sending the EndTxn 
request, and got 'CONCURRENT_TRANSACTIONS' on retry.
{noformat}
[2017-06-08 21:25:41,474] TRACE Produced messages to topic-partition 
output-topic-2 with base offset offset 28618 and error: null. 
(org.apache.kafka.clients.producer.internals.ProducerBatch)
[2017-06-08 21:25:41,474] TRACE [TransactionalId my-first-transactional-id] 
Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:25:41,474] DEBUG [TransactionalId my-first-transactional-id] 
Sending transactional request (type=EndTxnRequest, 
transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, 
result=COMMIT) to node worker12:9092 (id: 3 rack: null) 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Disconnected from 3. Will retry. 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=EndTxnRequest, 
transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, 
result=COMMIT) (org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] TRACE [TransactionalId my-first-transactional-id] 
Request (type=FindCoordinatorRequest, coordinatorKey=my-first-transactional-id, 
coordinatorType=TRANSACTION) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,534] DEBUG [TransactionalId my-first-transactional-id] 
Sending transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) to node 
worker5:9092 (id: 2 rack: null) 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,535] TRACE [TransactionalId my-first-transactional-id] 
Received transactional response FindCoordinatorResponse(throttleTimeMs=0, 
errorMessage='null', error=NONE, node=worker12:9092 (id: 3 rack: null)) for 
request (type=FindCoordinatorRequest, coordinatorKey=my-first-transactional-id, 
coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] TRACE [TransactionalId my-first-transactional-id] 
Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Disconnect from worker12:9092 (id: 3 rack: null) while trying to send request 
(type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT). Going to back off and retry 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=EndTxnRequest, 
transactionalId=my-first-tra

[jira] [Commented] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommits state.

2017-06-08 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043741#comment-16043741
 ] 

Apurva Mehta commented on KAFKA-5415:
-

Incriminating evidence from the logs.

1. Tail of the transaction log is PrepareCommit at offset 277
{noformat}
offset: 276 position: 46886 CreateTime: 1496957141444 isvalid: true keysize: 29 
valuesize: 95 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 
isTransactional: false headerKeys: [] key: 
transactionalId=my-first-transactional-id payload: 
producerId:2000,producerEpoch:0,state=Ongoing,partitions=Set(output-topic-2, 
__consumer_offsets-47, output-topic-0, 
output-topic-1),lastUpdateTimestamp=1496957141444
offset: 277 position: 47080 CreateTime: 1496957141285 isvalid: true keysize: 29 
valuesize: 95 magic: 2 compresscodec: NONE producerId: -1 sequence: -1 
isTransactional: false headerKeys: [] key: 
transactionalId=my-first-transactional-id payload: 
producerId:2000,producerEpoch:0,state=PrepareCommit,partitions=Set(output-topic-2,
 __consumer_offsets-47, output-topic-0, 
output-topic-1),lastUpdateTimestamp=1496957141285
{noformat}

2. The client disconnected from the coordinator after sending the EndTxn request
{noformat}
[2017-06-08 21:25:41,474] TRACE Produced messages to topic-partition 
output-topic-2 with base offset offset 28618 and error: null. 
(org.apache.kafka.clients.producer.internals.ProducerBatch)
[2017-06-08 21:25:41,474] TRACE [TransactionalId my-first-transactional-id] 
Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:25:41,474] DEBUG [TransactionalId my-first-transactional-id] 
Sending transactional request (type=EndTxnRequest, 
transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, 
result=COMMIT) to node worker12:9092 (id: 3 rack: null) 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Disconnected from 3. Will retry. 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=EndTxnRequest, 
transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, 
result=COMMIT) (org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,533] TRACE [TransactionalId my-first-transactional-id] 
Request (type=FindCoordinatorRequest, coordinatorKey=my-first-transactional-id, 
coordinatorType=TRANSACTION) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,534] DEBUG [TransactionalId my-first-transactional-id] 
Sending transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) to node 
worker5:9092 (id: 2 rack: null) 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,535] TRACE [TransactionalId my-first-transactional-id] 
Received transactional response FindCoordinatorResponse(throttleTimeMs=0, 
errorMessage='null', error=NONE, node=worker12:9092 (id: 3 rack: null)) for 
request (type=FindCoordinatorRequest, coordinatorKey=my-first-transactional-id, 
coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] TRACE [TransactionalId my-first-transactional-id] 
Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT) dequeued for sending 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Disconnect from worker12:9092 (id: 3 rack: null) while trying to send request 
(type=EndTxnRequest, transactionalId=my-first-transactional-id, 
producerId=2000, producerEpoch=0, result=COMMIT). Going to back off and retry 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=FindCoordinatorRequest, 
coordinatorKey=my-first-transactional-id, coordinatorType=TRANSACTION) 
(org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-06-08 21:26:11,535] DEBUG [TransactionalId my-first-transactional-id] 
Enqueuing transactional request (type=EndTxnRequest, 
transactionalId=my-first-transactional-id, producerId=2000, producerEpoch=0, 
result=COMMIT) (org.apache.kafka.clients.prod

[jira] [Commented] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommits state.

2017-06-08 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043731#comment-16043731
 ] 

Apurva Mehta commented on KAFKA-5415:
-

Logs from one such incident: 
https://issues.apache.org/jira/secure/attachment/12872179/6.tgz

> TransactionCoordinator gets stuck in PrepareCommits state.
> --
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommits state.

2017-06-08 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5415:

Attachment: 6.tgz

> TransactionCoordinator gets stuck in PrepareCommits state.
> --
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommits state.

2017-06-08 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5415:

Labels: exactly-once  (was: )

> TransactionCoordinator gets stuck in PrepareCommits state.
> --
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5415) TransactionCoordinator get stuck in PrepareCommits state.

2017-06-08 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5415:

Summary: TransactionCoordinator get stuck in PrepareCommits state.  (was: 
TransactionCoordinator )

> TransactionCoordinator get stuck in PrepareCommits state.
> -
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
> Fix For: 0.11.0.0
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommits state.

2017-06-08 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5415:

Summary: TransactionCoordinator gets stuck in PrepareCommits state.  (was: 
TransactionCoordinator get stuck in PrepareCommits state.)

> TransactionCoordinator gets stuck in PrepareCommits state.
> --
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
> Fix For: 0.11.0.0
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5415) TransactionCoordinator

2017-06-08 Thread Apurva Mehta (JIRA)
Apurva Mehta created KAFKA-5415:
---

 Summary: TransactionCoordinator 
 Key: KAFKA-5415
 URL: https://issues.apache.org/jira/browse/KAFKA-5415
 Project: Kafka
  Issue Type: Bug
Reporter: Apurva Mehta
Assignee: Apurva Mehta
Priority: Blocker
 Fix For: 0.11.0.0


This has been revealed by the system test failures on jenkins. 

The transaction coordinator seems to get into a path during the handling of the 
EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
COORDINATOR_NOT_AVAILABLE error, to be revealed by 
https://github.com/apache/kafka/pull/3278) to the client. However, due to 
network instability, the producer is disconnected before it receives this error.

As a result, the transaction remains in a `PrepareXX` state, and future 
`EndTxn` requests sent by the client after reconnecting result in a 
`CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3278: WIP: MINOR: Add logging and a small bug fix for th...

2017-06-08 Thread apurvam
GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/3278

WIP: MINOR: Add logging and a small bug fix for the transaction coordinator



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka 
MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3278.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3278


commit 02b3816a626910ed9cdf08746c61de83d7c039aa
Author: Apurva Mehta 
Date:   2017-06-09T00:20:19Z

Add logging and a small bug fix for the transaction coordinator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-5414) Console consumer offset commit regression

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043720#comment-16043720
 ] 

ASF GitHub Bot commented on KAFKA-5414:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3277

KAFKA-5414: Revert "KAFKA-5327: ConsoleConsumer should manually commi…

…t offsets for records that are returned in receive()"

This reverts commit d7d1196a0b542adb46d22eeb5b6d12af950b64c9.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-5414

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3277


commit 2676945c37f3526f73e0dbc0da66e14bb4f73d7e
Author: Jason Gustafson 
Date:   2017-06-09T00:26:01Z

KAFKA-5414: Revert "KAFKA-5327: ConsoleConsumer should manually commit 
offsets for records that are returned in receive()"

This reverts commit d7d1196a0b542adb46d22eeb5b6d12af950b64c9.




> Console consumer offset commit regression
> -
>
> Key: KAFKA-5414
> URL: https://issues.apache.org/jira/browse/KAFKA-5414
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.11.0.0
>
>
> In KAFKA-5327, the behavior of console consumer was changed to only commit 
> offsets when the process closes. Previously we used periodic offset commits. 
> This breaks existing usage in system tests and probably elsewhere.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3277: KAFKA-5414: Revert "KAFKA-5327: ConsoleConsumer sh...

2017-06-08 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3277

KAFKA-5414: Revert "KAFKA-5327: ConsoleConsumer should manually commi…

…t offsets for records that are returned in receive()"

This reverts commit d7d1196a0b542adb46d22eeb5b6d12af950b64c9.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-5414

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3277


commit 2676945c37f3526f73e0dbc0da66e14bb4f73d7e
Author: Jason Gustafson 
Date:   2017-06-09T00:26:01Z

KAFKA-5414: Revert "KAFKA-5327: ConsoleConsumer should manually commit 
offsets for records that are returned in receive()"

This reverts commit d7d1196a0b542adb46d22eeb5b6d12af950b64c9.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-0.11.0-jdk7 #126

2017-06-08 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-5414) Console consumer offset commit regression

2017-06-08 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-5414:
--

 Summary: Console consumer offset commit regression
 Key: KAFKA-5414
 URL: https://issues.apache.org/jira/browse/KAFKA-5414
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson
Priority: Critical
 Fix For: 0.11.0.0


In KAFKA-5327, the behavior of console consumer was changed to only commit 
offsets when the process closes. Previously we used periodic offset commits. 
This breaks existing usage in system tests and probably elsewhere.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5361) Add EOS integration tests for Streams API

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043677#comment-16043677
 ] 

ASF GitHub Bot commented on KAFKA-5361:
---

GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/3276

KAFKA-5361: Add more integration tests for Streams EOS

 - multi-subtopology tests
 - fencing test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka 
kafka-5361-add-eos-integration-tests-for-streams-api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3276.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3276


commit 93b4decb761d763dabec44c0d9a8b835fdd2a32f
Author: Matthias J. Sax 
Date:   2017-06-02T23:19:34Z

KAFKA-5361: Add more integration tests
 - multi-subtopology tests
 - fencing test




> Add EOS integration tests for Streams API
> -
>
> Key: KAFKA-5361
> URL: https://issues.apache.org/jira/browse/KAFKA-5361
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 0.11.0.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> We need to add more integration tests for Streams API with exactly-once 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Pull-Requests not worked on

2017-06-08 Thread Ismael Juma
Hi all,

A few clarifications:

1. Anyone can see the console output for a failed job, a login is not
required for that.

2. One of the reasons why some PRs remain open even if they should be
closed is that we have no way to close PRs ourselves without referencing
them in a commit or asking Apache Infra for help. So, abandoned PRs tend to
remain open. This is very frustrating, but there is no solution in sight.

3. As Colin pointed out, the stability of the JUnit tests has improved a
lot in recent weeks. Thanks to everyone who has helped with this.

4. "retest this please" became available once I switched the Jenkins PR
builder to the community one (Apache previously only allowed the usage of
the Cloudbees Enterprise one). I updated the relevant wiki pages and I
think I sent an email to the mailing list at the time.

Ismael



On Fri, Jun 9, 2017 at 12:47 AM, James Cheng  wrote:

> I always thought that we had to repush a commit or close/open the PR in
> order to retrigger the build. This is the first time I've heard that you
> can do that.
>
> Does "retest this please" cause the tests to be re-run, simply because
> it's a comment on a PR? Or is it some magic with how we set things up? Or
> does someone simply end up reading the comment, and then pushing the button
> on behalf of the requester?
>
> -James
>
> > On Jun 8, 2017, at 1:48 PM, Colin McCabe  wrote:
> >
> > So, there has been some instability in the junit tests recently due to
> > some big features landing.  This is starting to improve a lot, though,
> > so hopefully it won't be a problem for much longer.
> >
> > If you do hit what you think is a bogus unit test failure, you can type
> > "retest this please" in the PR to trigger another junit run.
> >
> > It would be good to close some of the stale PRs.  I have some myself
> > that I should probably close since they have been superseded by
> > different approaches.
> >
> > With regard to OSGi specifically, it might be hard to find someone who
> > has the relevant expertise to review the patch.  Since our understanding
> > is limited, maybe it makes sense to have  someone else who understands
> > OSGi repackage the software for that system?  I'm just tossing out ideas
> > here, though-- I could be wrong.
> >
> > best,
> > Colin
> >
> >
> > On Tue, May 23, 2017, at 09:48, Jeff Widman wrote:
> >> I agree with this. As a new contributor, it was a bit demoralizing to
> >> look
> >> at all the open PR's and wonder whether when I sent a patch it would
> just
> >> be left to sit in the ether.
> >>
> >> In other projects I'm involved with, more typically the maintainers go
> >> through periodically and close old PR's that will never be merged. I
> know
> >> at this point it's an intimidating amount of work, but I still think
> it'd
> >> be useful to cut down this backlog.
> >>
> >> Maybe at the SF Kafka summit sprint have a group that does this? It's a
> >> decent task for n00bs who want to help but don't know where to start to
> >> ask
> >> them to help identify PR's that are ancient and should be closed as they
> >> will never be merged.
> >>
> >> On Tue, May 23, 2017 at 4:59 AM,  wrote:
> >>
> >>> Hello everyone
> >>>
> >>> I am wondering how pull-requests are handled for Kafka? There is
> currently
> >>> a huge amount of PRs on Github and most of them are not getting any
> >>> attention.
> >>>
> >>> If the maintainers only have a look at PR which passed the CI (which
> makes
> >>> sense due to the amount), then there is a problem, because the
> CI-pipeline
> >>> is not stable. I've submitted a PR myself which adds OSGi-metadata to
> the
> >>> kafka-clients artifact (see 2882). The pipeline fails randomly even
> though
> >>> the change only adds some entries to the manifest.
> >>> The next issue I have is, that people submitting PRs cannot have a
> look at
> >>> the failing CI job. So with regards to my PR, I dont have a clue what
> went
> >>> wrong.
> >>> If I am missing something in the process please let me know.
> >>> Regarding PR 2882, please consider merging because it would safe the
> >>> osgi-community the effort of wrapping the kafka artifact and deploy it
> >>> with different coordinates on maven central (which can confuse users)
> >>> regards
> >>> Marc
> >>>
>
>


[GitHub] kafka pull request #3276: KAFKA-5361: Add more integration tests for Streams...

2017-06-08 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/3276

KAFKA-5361: Add more integration tests for Streams EOS

 - multi-subtopology tests
 - fencing test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka 
kafka-5361-add-eos-integration-tests-for-streams-api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3276.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3276


commit 93b4decb761d763dabec44c0d9a8b835fdd2a32f
Author: Matthias J. Sax 
Date:   2017-06-02T23:19:34Z

KAFKA-5361: Add more integration tests
 - multi-subtopology tests
 - fencing test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Pull-Requests not worked on

2017-06-08 Thread James Cheng
I always thought that we had to repush a commit or close/open the PR in order 
to retrigger the build. This is the first time I've heard that you can do that. 

Does "retest this please" cause the tests to be re-run, simply because it's a 
comment on a PR? Or is it some magic with how we set things up? Or does someone 
simply end up reading the comment, and then pushing the button on behalf of the 
requester?

-James

> On Jun 8, 2017, at 1:48 PM, Colin McCabe  wrote:
> 
> So, there has been some instability in the junit tests recently due to
> some big features landing.  This is starting to improve a lot, though,
> so hopefully it won't be a problem for much longer.
> 
> If you do hit what you think is a bogus unit test failure, you can type
> "retest this please" in the PR to trigger another junit run.
> 
> It would be good to close some of the stale PRs.  I have some myself
> that I should probably close since they have been superseded by
> different approaches.
> 
> With regard to OSGi specifically, it might be hard to find someone who
> has the relevant expertise to review the patch.  Since our understanding
> is limited, maybe it makes sense to have  someone else who understands
> OSGi repackage the software for that system?  I'm just tossing out ideas
> here, though-- I could be wrong.
> 
> best,
> Colin
> 
> 
> On Tue, May 23, 2017, at 09:48, Jeff Widman wrote:
>> I agree with this. As a new contributor, it was a bit demoralizing to
>> look
>> at all the open PR's and wonder whether when I sent a patch it would just
>> be left to sit in the ether.
>> 
>> In other projects I'm involved with, more typically the maintainers go
>> through periodically and close old PR's that will never be merged. I know
>> at this point it's an intimidating amount of work, but I still think it'd
>> be useful to cut down this backlog.
>> 
>> Maybe at the SF Kafka summit sprint have a group that does this? It's a
>> decent task for n00bs who want to help but don't know where to start to
>> ask
>> them to help identify PR's that are ancient and should be closed as they
>> will never be merged.
>> 
>> On Tue, May 23, 2017 at 4:59 AM,  wrote:
>> 
>>> Hello everyone
>>> 
>>> I am wondering how pull-requests are handled for Kafka? There is currently
>>> a huge amount of PRs on Github and most of them are not getting any
>>> attention.
>>> 
>>> If the maintainers only have a look at PR which passed the CI (which makes
>>> sense due to the amount), then there is a problem, because the CI-pipeline
>>> is not stable. I've submitted a PR myself which adds OSGi-metadata to the
>>> kafka-clients artifact (see 2882). The pipeline fails randomly even though
>>> the change only adds some entries to the manifest.
>>> The next issue I have is, that people submitting PRs cannot have a look at
>>> the failing CI job. So with regards to my PR, I dont have a clue what went
>>> wrong.
>>> If I am missing something in the process please let me know.
>>> Regarding PR 2882, please consider merging because it would safe the
>>> osgi-community the effort of wrapping the kafka artifact and deploy it
>>> with different coordinates on maven central (which can confuse users)
>>> regards
>>> Marc
>>> 



Build failed in Jenkins: kafka-trunk-jdk7 #2374

2017-06-08 Thread Apache Jenkins Server
See 


Changes:

[cshapi] KAFKA-5411: AdminClient javadoc and documentation improvements

[wangguoz] MINOR: disable flaky Streams EOS integration tests

--
[...truncated 956.76 KB...]

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.MetricsDuringTopicCreationDeletionTest > 
testMetricsDuringTopicCreateDelete STARTED

kafka.integration.MetricsDuringTopicCreationDeletionTest > 
testMetricsDuringTopicCreateDelete PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooLow 
PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooHigh 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooHigh 
PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooHigh 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooHigh 
PASSED

kafka.integration.TopicMetadataTest > testIsrAfterBrokerShutDownAndJoinsBack 
STARTED

kafka.integration.TopicMetadataTest > testIsrAfterBrokerShutDownAndJoinsBack 
PASSED

kafka.integration.TopicMetadataTest > testAutoCreateTopicWithCollision STARTED

kafka.integration.TopicMetadataTest > testAutoCreateTopicWithCollision PASSED

kafka.integration.TopicMetadataTest > testAliveBrokerListWithNoTopics STARTED

kafka.integration.TopicMetadataTest > testAliveBrokerListWithNoTopics PASSED

kafka.integration.TopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.TopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.TopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.TopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.TopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.TopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.TopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.TopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.TopicMetadataTest > testAutoCreateTopicWithInvalidReplication 
STARTED

kafka.integration.TopicMetadataTest > testAutoCreateTopicWithInvalidReplication 
PASSED

kafka.integration.TopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.TopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.FetcherTest > testFetcher STARTED

kafka.integration.FetcherTest > testFetcher PASSED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionEnabled 
STARTED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionEnabled 
PASSED

kafka.integration.UncleanLeaderElectionTest > 
testCleanLeaderElectionDisabledByTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testCleanLeaderElectionDisabledByTopicOverride SKIPPED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionDisabled 
STARTED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionDisabled 
SKIPPED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionInvalidTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionInvalidTopicOverride PASSED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionEnabledByTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionEnabledByTopicOverride PASSED

kafka.integration.MinIsrConfigTest > testDefaultKafkaConfig STARTED

kafka.integration.MinIsrConfigTest > testDefaultKafkaConfig PASSED

unit.kafka.server.Kafka

[jira] [Created] (KAFKA-5413) Log cleaner fails due to large offset in segment file

2017-06-08 Thread Nicholas Ngorok (JIRA)
Nicholas Ngorok created KAFKA-5413:
--

 Summary: Log cleaner fails due to large offset in segment file
 Key: KAFKA-5413
 URL: https://issues.apache.org/jira/browse/KAFKA-5413
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.2.1, 0.10.2.0
 Environment: Ubuntu 14.04 LTS, Oracle Java 8u92, kafka_2.11-0.10.2.0
Reporter: Nicholas Ngorok


The log cleaner thread in our brokers is failing with the trace below

{noformat}
[2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: Cleaning 
segment 0 in log __consumer_offsets-12 (largest timestamp Thu Jun 08 15:48:59 
PDT 2017) into 0, retaining deletes. (kafka.log.LogCleaner)
[2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: Cleaning 
segment 2147343575 in log __consumer_offsets-12 (largest timestamp Thu Jun 08 
15:49:06 PDT 2017) into 0, retaining deletes. (kafka.log.LogCleaner)
[2017-06-08 15:49:54,834] ERROR {kafka-log-cleaner-thread-0} 
[kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
java.lang.IllegalArgumentException: requirement failed: largest offset in 
message set can not be safely converted to relative offset.
at scala.Predef$.require(Predef.scala:224)
at kafka.log.LogSegment.append(LogSegment.scala:109)
at kafka.log.Cleaner.cleanInto(LogCleaner.scala:478)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:405)
at 
kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:401)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
at scala.collection.immutable.List.foreach(List.scala:381)
at kafka.log.Cleaner.clean(LogCleaner.scala:362)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2017-06-08 15:49:54,835] INFO {kafka-log-cleaner-thread-0} 
[kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
{noformat}

This seems to point at the specific line [here| 
https://github.com/apache/kafka/blob/0.11.0/core/src/main/scala/kafka/log/LogSegment.scala#L92]
 in the kafka src where the difference is actually larger than MAXINT as both 
baseOffset and offset are of type long. It was introduced in this [pr| 
https://github.com/apache/kafka/pull/2210/files/56d1f8196b77a47b176b7bbd1e4220a3be827631]

These were the outputs of dumping the first two log segments

{noformat}
:~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
--files /kafka-logs/__consumer_offsets-12/000
0.log
Dumping /kafka-logs/__consumer_offsets-12/.log
Starting offset: 0
offset: 1810054758 position: 0 NoTimestampType: -1 isvalid: true payloadsize: 
-1 magic: 0 compresscodec: NONE crc: 3127861909 keysize: 34

:~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
--files /kafka-logs/__consumer_offsets-12/000
0002147343575.log
Dumping /kafka-logs/__consumer_offsets-12/002147343575.log
Starting offset: 2147343575
offset: 2147539884 position: 0 NoTimestampType: -1 isvalid: true paylo
adsize: -1 magic: 0 compresscodec: NONE crc: 2282192097 keysize: 34
{noformat}

My guess is that since 2147539884 is larger than MAXINT, we are hitting this 
exception. Was there a specific reason, this check was added in 0.10.2?

E.g. if the first offset is a key = "key 0" and then we have MAXINT + 1 of "key 
1" following, wouldn't we run into this situation whenever the log cleaner runs?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3275: HOTFIX: use atomic boolean for inject errors in st...

2017-06-08 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/3275

HOTFIX: use atomic boolean for inject errors in streams eos integration test

Originally we assume the task will be created exactly three times (twice 
upon starting up, once for each thread, and then one more time when rebalancing 
upon the thread failure). However there is a likelihood that upon starting up 
more than one rebalance will be triggered, and hence the tasks will be 
initialized more than 3 times, i.e. there will be more than three hashcodes of 
the `Transformer` object, causing the `errorInjected` to never be taken and 
exception never thrown.

The current fix is to use an atomic boolean instead and let threads compete 
on compare-and-set to make sure exactly one thread will throw exception, and 
will only throw once.

Without this patch I can reproduce the issue on my local machine with a 
single core ever 3-5 times; with this patch I have been running successfully 
for 10+ runs.

Ping @mjsax @ijuma for reviews.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka 
KHotfix-eos-integration-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3275.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3275


commit ef0d8440a880ecdd93db3c043e4f9e6467e3aac3
Author: Guozhang Wang 
Date:   2017-06-08T23:36:27Z

use atomic boolean for inject errors




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-5382) Log recovery can fail if topic names contain one of the index suffixes

2017-06-08 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-5382.

   Resolution: Fixed
Fix Version/s: (was: 0.10.2.2)
   0.11.0.0.

Marking this resolved since we've fixed it for 0.11.0. We can consider it for a 
0.10.2 bug fix release (if we have one) in case anyone needs it.

> Log recovery can fail if topic names contain one of the index suffixes
> --
>
> Key: KAFKA-5382
> URL: https://issues.apache.org/jira/browse/KAFKA-5382
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0, 0.10.2.1
>Reporter: Jason Gustafson
> Fix For: 0.11.0.0.
>
>
> Our log recovery logic fails in 0.10.2 and prior releases if the topic name 
> contains "index" or "timeindex." The issue is this snippet:
> {code}
> val logFile =
>   if (filename.endsWith(TimeIndexFileSuffix))
> new File(file.getAbsolutePath.replace(TimeIndexFileSuffix, 
> LogFileSuffix))
>   else
> new File(file.getAbsolutePath.replace(IndexFileSuffix, 
> LogFileSuffix))
> if(!logFile.exists) {
>   warn("Found an orphaned index file, %s, with no corresponding log 
> file.".format(file.getAbsolutePath))
>   file.delete()
> }
> {code}
> The {{replace}} is a global replace, so the substituted filename is incorrect 
> if the topic contains the index suffix.
> Note this is already fixed in trunk and 0.11.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-2378) Add Connect embedded API

2017-06-08 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-2378:

Summary: Add Connect embedded API  (was: Add Copycat embedded API)

> Add Connect embedded API
> 
>
> Key: KAFKA-2378
> URL: https://issues.apache.org/jira/browse/KAFKA-2378
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> Much of the required Copycat API will exist from previous patches since any 
> main() method will need to do very similar operations. However, integrating 
> with any other Java code may require additional API support.
> For example, one of the use cases when integrating with any stream processing 
> application will require knowing which topics will be written to. We will 
> need to add APIs to expose the topics a registered connector is writing to so 
> they can be consumed by a stream processing task



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : kafka-trunk-jdk8 #1680

2017-06-08 Thread Apache Jenkins Server
See 




Build failed in Jenkins: kafka-0.11.0-jdk7 #125

2017-06-08 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] KAFKA-5362: Add EOS system tests for Streams API

--
[...truncated 2.37 MB...]
org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldAllowToQueryAfterThreadDied STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldAllowToQueryAfterThreadDied PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryStateWithZeroSizedCache STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryStateWithZeroSizedCache PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryMapValuesAfterFilterState STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryMapValuesAfterFilterState PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryFilterState STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryFilterState PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryStateWithNonZeroSizedCache STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryStateWithNonZeroSizedCache PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldLeftOuterJoinQueryable STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldLeftOuterJoinQueryable PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldInnerLeftJoin STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldInnerLeftJoin PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldOuterOuterJoin STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldOuterOuterJoin PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldInnerOuterJoin STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldInnerOuterJoin PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldLeftInnerJoin STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldLeftInnerJoin PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldInnerInnerJoinQueryable STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldInnerInnerJoinQueryable PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldInnerOuterJoinQueryable STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldInnerOuterJoinQueryable PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldOuterLeftJoin STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldOuterLeftJoin PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldOuterInnerJoin STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldOuterInnerJoin PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldInnerInnerJoin STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldInnerInnerJoin PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldLeftOuterJoin STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldLeftOuterJoin PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldOuterLeftJoinQueryable STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldOuterLeftJoinQueryable PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldLeftLeftJoin STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldLeftLeftJoin PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldOuterInnerJoinQueryable STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldOuterInnerJoinQueryable PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldLeftInnerJoinQueryable STARTED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationTest > 
shouldLeftInnerJoinQueryable PASSED

org.apache.kafka.streams.integration.KTableKTableJoinIntegrationT

[jira] [Commented] (KAFKA-5317) Update KIP-98 to reflect changes during implementation.

2017-06-08 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043562#comment-16043562
 ] 

Ismael Juma commented on KAFKA-5317:


Technically, it should be.

> Update KIP-98 to reflect changes during implementation.
> ---
>
> Key: KAFKA-5317
> URL: https://issues.apache.org/jira/browse/KAFKA-5317
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Apurva Mehta
>Priority: Blocker
>  Labels: documentation, exactly-once
> Fix For: 0.11.0.0
>
>
> While implementing the EOS design, there are some minor (or major?) tweaks we 
> made as we hands-on the code bases. We will compile all these changes in a 
> single run at the end of the code-ready and this JIRA is for keeping track of 
> all the changes we made.
> 02/27/2017: collapse the two types of transactional log messages into a 
> single type with key as transactional id, and value as [pid, epoch, 
> transaction_timeout, transaction_state, [topic partition ] ]. Also using a 
> single memory map in cache instead of two on the TC.
> 03/01/2017: for pid expiration, we decided to use min(transactional id 
> expiration timeout, topic retention). For topics enabled for compaction only, 
> we just use the transactional timeout. If the retention setting is larger 
> than the transactional id expiration timeout, then the pid will be 
> "logically" expired (i.e. we will remove it from the cached pid mapping and 
> ignore it when rebuilding the cache from the log)
> 03/20/2017: add a new exception type in `o.a.k.common.errors` for invalid 
> transaction timeout values.
> 03/25/2017: extend WriteTxnMarkerRequest to contain multiple markers for 
> multiple PIDs with a single request.
> 04/20/2017: add transactionStartTime to TransactionMetadata
> 04/26/2017: added a new retriable error: Errors.CONCURRENT_TRANSACTIONS
> 04/01/2017: We also enforce acks=all on the client when idempotence is 
> enabled. Without this, we cannot again guarantee idemptoence.
> 04/10/2017: WE also don't pass the underlying exception to 
> `RetriableOffsetCommitFailedException` anymore: 
> https://issues.apache.org/jira/browse/KAFKA-5052



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : kafka-trunk-jdk7 #2373

2017-06-08 Thread Apache Jenkins Server
See 




[jira] [Commented] (KAFKA-3199) LoginManager should allow using an existing Subject

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043552#comment-16043552
 ] 

ASF GitHub Bot commented on KAFKA-3199:
---

GitHub user utenakr opened a pull request:

https://github.com/apache/kafka/pull/3274

KAFKA-3199 LoginManager should allow using an existing Subject

LoginManager or KerberosLogin (for > kafka 0.10) should allow using an 
existing Subject. If there's an existing subject, the Jaas configuration won't 
needed in getService()

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/utenakr/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3274


commit 95d6b98440f02fee23f8063ad082ec3dae4bd0b2
Author: Ji Sun 
Date:   2017-06-08T22:21:50Z

KAFKA-3199 LoginManager should allow using an existing Subject




> LoginManager should allow using an existing Subject
> ---
>
> Key: KAFKA-3199
> URL: https://issues.apache.org/jira/browse/KAFKA-3199
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Adam Kunicki
>Assignee: Adam Kunicki
>Priority: Critical
>
> LoginManager currently creates a new Login in the constructor which then 
> performs a login and starts a ticket renewal thread. The problem here is that 
> because Kafka performs its own login, it doesn't offer the ability to re-use 
> an existing subject that's already managed by the client application.
> The goal of LoginManager appears to be to be able to return a valid Subject. 
> It would be a simple fix to have LoginManager.acquireLoginManager() check for 
> a new config e.g. kerberos.use.existing.subject. 
> This would instead of creating a new Login in the constructor simply call 
> Subject.getSubject(AccessController.getContext()); to use the already logged 
> in Subject.
> This is also doable without introducing a new configuration and simply 
> checking if there is already a valid Subject available, but I think it may be 
> preferable to require that users explicitly request this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3274: KAFKA-3199 LoginManager should allow using an exis...

2017-06-08 Thread utenakr
GitHub user utenakr opened a pull request:

https://github.com/apache/kafka/pull/3274

KAFKA-3199 LoginManager should allow using an existing Subject

LoginManager or KerberosLogin (for > kafka 0.10) should allow using an 
existing Subject. If there's an existing subject, the Jaas configuration won't 
needed in getService()

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/utenakr/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3274


commit 95d6b98440f02fee23f8063ad082ec3dae4bd0b2
Author: Ji Sun 
Date:   2017-06-08T22:21:50Z

KAFKA-3199 LoginManager should allow using an existing Subject




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3272: MINOR: disable flaky Streams EOS integration tests

2017-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3272


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-5411) Generate javadoc for AdminClient and show configs in documentation

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043546#comment-16043546
 ] 

ASF GitHub Bot commented on KAFKA-5411:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3271


> Generate javadoc for AdminClient and show configs in documentation
> --
>
> Key: KAFKA-5411
> URL: https://issues.apache.org/jira/browse/KAFKA-5411
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 0.11.0.0, 0.11.1.0
>
>
> Also fix the table of contents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3271: KAFKA-5411: AdminClient javadoc and documentation ...

2017-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3271


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-5411) Generate javadoc for AdminClient and show configs in documentation

2017-06-08 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-5411:

   Resolution: Fixed
Fix Version/s: 0.11.1.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 3271
[https://github.com/apache/kafka/pull/3271]

> Generate javadoc for AdminClient and show configs in documentation
> --
>
> Key: KAFKA-5411
> URL: https://issues.apache.org/jira/browse/KAFKA-5411
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 0.11.0.0, 0.11.1.0
>
>
> Also fix the table of contents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3272: MINOR: disable flaky Streams EOS integration tests

2017-06-08 Thread mjsax
GitHub user mjsax reopened a pull request:

https://github.com/apache/kafka/pull/3272

MINOR: disable flaky Streams EOS integration tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka minor-disable-eos-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3272.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3272


commit 07f9b8fc90ed13bb551c5806b1147d63ec4c2b88
Author: Matthias J. Sax 
Date:   2017-06-08T17:01:41Z

MINOR: disable flaky Streams EOS integration tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Info regarding kafka topic

2017-06-08 Thread BigData dev
Hi,
I have a 3 node Kafka Broker cluster.
I have created a topic and the leader for the topic is broker 1(1001). And
the broker got died.
But when I see the information in zookeeper for the topic, I see the leader
is still set to broker 1 (1001) and isr is set to 1001. Is this a bug in
kafka, as now leader is died, the leader should have set to none.

*[zk: localhost:2181(CONNECTED) 7] get
/brokers/topics/t3/partitions/0/state*

*{"controller_epoch":1,"leader":1001,"version":1,"leader_epoch":1,"isr":[1001]}*

*cZxid = 0x10078*

*ctime = Thu Jun 08 14:50:07 PDT 2017*

*mZxid = 0x1008c*

*mtime = Thu Jun 08 14:51:09 PDT 2017*

*pZxid = 0x10078*

*cversion = 0*

*dataVersion = 1*

*aclVersion = 0*

*ephemeralOwner = 0x0*

*dataLength = 78*

*numChildren = 0*

*[zk: localhost:2181(CONNECTED) 8] *


And when I use describe command the output is

*[root@meets2 kafka-broker]# bin/kafka-topics.sh --describe --topic t3
--zookeeper localhost:2181*

*Topic:t3 PartitionCount:1 ReplicationFactor:2 Configs:*

*Topic: t3 Partition: 0 Leader: 1001 Replicas: 1001,1003 Isr: 1001*


When I use unavailable-partition option, I can know correctly.

*[root@meets2 kafka-broker]# bin/kafka-topics.sh --describe --topic t3
--zookeeper localhost:2181 --unavailable-partitions*

* Topic: t3 Partition: 0 Leader: 1001 Replicas: 1001,1003 Isr: 1001*


But in zookeeper topic state, the leader should have been set to none, not
the actual leader when the broker has died. Is this according to design or
is it a bug in Kafka. Could you please provide any information on this?


*Thanks,*

*Bharat*


[jira] [Commented] (KAFKA-5317) Update KIP-98 to reflect changes during implementation.

2017-06-08 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043541#comment-16043541
 ] 

Gwen Shapira commented on KAFKA-5317:
-

I understand the importance, but is it really a release blocker?

> Update KIP-98 to reflect changes during implementation.
> ---
>
> Key: KAFKA-5317
> URL: https://issues.apache.org/jira/browse/KAFKA-5317
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Apurva Mehta
>Priority: Blocker
>  Labels: documentation, exactly-once
> Fix For: 0.11.0.0
>
>
> While implementing the EOS design, there are some minor (or major?) tweaks we 
> made as we hands-on the code bases. We will compile all these changes in a 
> single run at the end of the code-ready and this JIRA is for keeping track of 
> all the changes we made.
> 02/27/2017: collapse the two types of transactional log messages into a 
> single type with key as transactional id, and value as [pid, epoch, 
> transaction_timeout, transaction_state, [topic partition ] ]. Also using a 
> single memory map in cache instead of two on the TC.
> 03/01/2017: for pid expiration, we decided to use min(transactional id 
> expiration timeout, topic retention). For topics enabled for compaction only, 
> we just use the transactional timeout. If the retention setting is larger 
> than the transactional id expiration timeout, then the pid will be 
> "logically" expired (i.e. we will remove it from the cached pid mapping and 
> ignore it when rebuilding the cache from the log)
> 03/20/2017: add a new exception type in `o.a.k.common.errors` for invalid 
> transaction timeout values.
> 03/25/2017: extend WriteTxnMarkerRequest to contain multiple markers for 
> multiple PIDs with a single request.
> 04/20/2017: add transactionStartTime to TransactionMetadata
> 04/26/2017: added a new retriable error: Errors.CONCURRENT_TRANSACTIONS
> 04/01/2017: We also enforce acks=all on the client when idempotence is 
> enabled. Without this, we cannot again guarantee idemptoence.
> 04/10/2017: WE also don't pass the underlying exception to 
> `RetriableOffsetCommitFailedException` anymore: 
> https://issues.apache.org/jira/browse/KAFKA-5052



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4628) Support KTable/GlobalKTable Joins

2017-06-08 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043528#comment-16043528
 ] 

Guozhang Wang commented on KAFKA-4628:
--

[~dminkovsky] Thanks for sharing your use cases. We are actively working on the 
table / global table join semantics now, stay tuned.

> Support KTable/GlobalKTable Joins
> -
>
> Key: KAFKA-4628
> URL: https://issues.apache.org/jira/browse/KAFKA-4628
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
> Fix For: 0.11.1.0
>
>
> In KIP-99 we have added support for GlobalKTables, however we don't currently 
> support KTable/GlobalKTable joins as they require materializing a state store 
> for the join. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-134: Delay initial consumer group rebalance

2017-06-08 Thread Guozhang Wang
Just recapping on client-side v.s. broker-side config: we did discuss about
adding this as a client-side config and bump up join-group request (I think
both Ismael and Ewen questioned about it) to include this configured value
to the broker. I cannot remember if there is any strong motivations against
going to the client-side config, except that we felt a default non-zero
value will benefit most users assuming they start with more than one member
in their group but only advanced users would really realize this config
existing and tune it themselves.

I agree that we could re-consider it for the next release if we observe
that it is actually affecting more users than benefiting them.

Guozhang

On Wed, Jun 7, 2017 at 2:26 AM, Damian Guy  wrote:

> Hi Jun/Ismael,
>
> Sounds good to me.
>
> Thanks,
> Damian
>
> On Tue, 6 Jun 2017 at 23:08 Ismael Juma  wrote:
>
> > Hi Jun,
> >
> > The console consumer issue also came up in a conversation I was having
> > recently. Seems like the config/server.properties change is a reasonable
> > compromise given that we have other defaults that are for development.
> >
> > Ismael
> >
> > On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao  wrote:
> >
> > > Hi, Everyone,
> > >
> > > Sorry for being late on this thread. I just came across this thread. I
> > have
> > > a couple of concerns on this. (1) It seems the amount of delay will be
> > > application specific. So, it seems that it's better for the delay to
> be a
> > > client side config instead of a server side one? (2) When running
> console
> > > consumer in quickstart, a minimum of 3 sec delay seems to be a bad
> > > experience for our users.
> > >
> > > Since we are getting late into the release cycle, it may be a bit too
> > late
> > > to make big changes in the 0.11 release. Perhaps we should at least
> > > consider overriding the delay in config/server.properties to 0 to
> improve
> > > the quickstart experience?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy 
> > wrote:
> > >
> > > > Hi Onur,
> > > >
> > > > It was in my previous email. But here it is again.
> > > >
> > > > 
> > > >
> > > > 1. Better rebalance timing. We will try to rebalance only when all
> the
> > > > consumers in a group have joined. The challenge would be someone has
> to
> > > > define what does ALL consumers mean, it could either be a time or
> > number
> > > of
> > > > consumers, etc.
> > > >
> > > > 2. Avoid frequent rebalance. For example, if there are 100 consumers
> > in a
> > > > group, today, in the worst case, we may end up with 100 rebalances
> even
> > > if
> > > > all the consumers joined the group in a reasonably small amount of
> > time.
> > > > Frequent rebalance is also a bad thing for brokers.
> > > >
> > > > Having a client side configuration may solve problem 1 better because
> > > each
> > > > consumer group can potentially configure their own timing. However,
> it
> > > does
> > > > not really prevent frequent rebalance in general because some of the
> > > > consumers can be misconfigured. (This may have something to do with
> > > KIP-124
> > > > as well. But if quota is applied on the JoinGroup/SyncGroup request
> it
> > > may
> > > > cause some unwanted cascading effects.)
> > > >
> > > > Having a broker side configuration may result in less flexibility for
> > > each
> > > > consumer group, but it can prevent frequent rebalance better. I think
> > > with
> > > > some reasonable design, the rebalance timing issue can be resolved on
> > the
> > > > broker side as well. Matthias had a good point on extending the delay
> > > when
> > > > a new consumer joins a group (we actually did something similar to
> > batch
> > > > ISR change propagation). For example, let's say on the broker side,
> we
> > > will
> > > > always delay 2 seconds each time we see a new consumer joining a
> > consumer
> > > > group. This would probably work for most of the consumer groups and
> > will
> > > > also limit the rebalance frequency to protect the brokers.
> > > >
> > > > I am not sure about the streams use case here, but if something like
> 2
> > > > seconds of delay is acceptable for streams, I would prefer adding the
> > > > configuration to the broker so that we can address both problems.
> > > >
> > > > On Thu, 6 Apr 2017 at 17:11 Onur Karaman <
> onurkaraman.apa...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Damian.
> > > > >
> > > > > Can you copy the point Becket made earlier that you say isn't
> > > addressed?
> > > > >
> > > > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy 
> > > wrote:
> > > > >
> > > > > > Thanks all, the Vote is now closed and the KIP has been accepted
> > > with 9
> > > > > +1s
> > > > > >
> > > > > > 3 binding::
> > > > > > Guozhang,
> > > > > > Jason,
> > > > > > Ismael
> > > > > >
> > > > > > 6 non-binding:
> > > > > > Bill,
> > > > > > Eno,
> > > > > > Mathieu,
> > > > > > Matthias,
> > > > > > Dong,
> 

[jira] [Commented] (KAFKA-3199) LoginManager should allow using an existing Subject

2017-06-08 Thread Adam Kunicki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043503#comment-16043503
 ] 

Adam Kunicki commented on KAFKA-3199:
-

Yes, thank you! [~jisunkim]

> LoginManager should allow using an existing Subject
> ---
>
> Key: KAFKA-3199
> URL: https://issues.apache.org/jira/browse/KAFKA-3199
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Adam Kunicki
>Assignee: Adam Kunicki
>Priority: Critical
>
> LoginManager currently creates a new Login in the constructor which then 
> performs a login and starts a ticket renewal thread. The problem here is that 
> because Kafka performs its own login, it doesn't offer the ability to re-use 
> an existing subject that's already managed by the client application.
> The goal of LoginManager appears to be to be able to return a valid Subject. 
> It would be a simple fix to have LoginManager.acquireLoginManager() check for 
> a new config e.g. kerberos.use.existing.subject. 
> This would instead of creating a new Login in the constructor simply call 
> Subject.getSubject(AccessController.getContext()); to use the already logged 
> in Subject.
> This is also doable without introducing a new configuration and simply 
> checking if there is already a valid Subject available, but I think it may be 
> preferable to require that users explicitly request this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-150 - Kafka-Streams Cogroup

2017-06-08 Thread Guozhang Wang
Note that although the internal `AbstractStoreSupplier` does maintain the
key-value serdes, we do not enforce the interface of `StateStoreSupplier`
to always retain that information, and hence we cannot assume that
StateStoreSuppliers always retain key / value serdes.

On Thu, Jun 8, 2017 at 11:51 AM, Xavier Léauté  wrote:

> Another reason for the serde not to be in the first cogroup call, is that
> the serde should not be required if you pass a StateStoreSupplier to
> aggregate()
>
> Regarding the aggregated type  I don't the why initializer should be
> favored over aggregator to define the type. In my mind separating the
> initializer into the last aggregate call clearly indicates that the
> initializer is independent of any of the aggregators or streams and that we
> don't wait for grouped1 events to initialize the co-group.
>
> On Thu, Jun 8, 2017 at 11:14 AM Guozhang Wang  wrote:
>
> > On a second thought... This is the current proposal API
> >
> >
> > ```
> >
> >  CogroupedKStream cogroup(final Initializer initializer,
> final
> > Aggregator aggregator, final Serde
> > aggValueSerde)
> >
> > ```
> >
> >
> > If we do not have the initializer in the first co-group it might be a bit
> > awkward for users to specify the aggregator that returns a typed 
> value?
> > Maybe it is still better to put these two functions in the same api?
> >
> >
> >
> > Guozhang
> >
> > On Thu, Jun 8, 2017 at 11:08 AM, Guozhang Wang 
> wrote:
> >
> > > This suggestion lgtm. I would vote for the first alternative than
> adding
> > > it to the `KStreamBuilder` though.
> > >
> > > On Thu, Jun 8, 2017 at 10:58 AM, Xavier Léauté 
> > > wrote:
> > >
> > >> I have a minor suggestion to make the API a little bit more symmetric.
> > >> I feel it would make more sense to move the initializer and serde to
> the
> > >> final aggregate statement, since the serde only applies to the state
> > >> store,
> > >> and the initializer doesn't bear any relation to the first group in
> > >> particular. It would end up looking like this:
> > >>
> > >> KTable cogrouped =
> > >> grouped1.cogroup(aggregator1)
> > >> .cogroup(grouped2, aggregator2)
> > >> .cogroup(grouped3, aggregator3)
> > >> .aggregate(initializer1, aggValueSerde, storeName1);
> > >>
> > >> Alternatively, we could move the the first cogroup() method to
> > >> KStreamBuilder, similar to how we have .merge()
> > >> and end up with an api that would be even more symmetric.
> > >>
> > >> KStreamBuilder.cogroup(grouped1, aggregator1)
> > >>   .cogroup(grouped2, aggregator2)
> > >>   .cogroup(grouped3, aggregator3)
> > >>   .aggregate(initializer1, aggValueSerde, storeName1);
> > >>
> > >> This doesn't have to be a blocker, but I thought it would make the API
> > >> just
> > >> a tad cleaner.
> > >>
> > >> On Tue, Jun 6, 2017 at 3:59 PM Guozhang Wang 
> > wrote:
> > >>
> > >> > Kyle,
> > >> >
> > >> > Thanks a lot for the updated KIP. It looks good to me.
> > >> >
> > >> >
> > >> > Guozhang
> > >> >
> > >> >
> > >> > On Fri, Jun 2, 2017 at 5:37 AM, Jim Jagielski 
> > wrote:
> > >> >
> > >> > > This makes much more sense to me. +1
> > >> > >
> > >> > > > On Jun 1, 2017, at 10:33 AM, Kyle Winkelman <
> > >> winkelman.k...@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > I have updated the KIP and my PR. Let me know what you think.
> > >> > > > To created a cogrouped stream just call cogroup on a
> > KgroupedStream
> > >> and
> > >> > > > supply the initializer, aggValueSerde, and an aggregator. Then
> > >> continue
> > >> > > > adding kgroupedstreams and aggregators. Then call one of the
> many
> > >> > > aggregate
> > >> > > > calls to create a KTable.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Kyle
> > >> > > >
> > >> > > > On Jun 1, 2017 4:03 AM, "Damian Guy" 
> > wrote:
> > >> > > >
> > >> > > >> Hi Kyle,
> > >> > > >>
> > >> > > >> Thanks for the update. I think just one initializer makes sense
> > as
> > >> it
> > >> > > >> should only be called once per key and generally it is just
> going
> > >> to
> > >> > > create
> > >> > > >> a new instance of whatever the Aggregate class is.
> > >> > > >>
> > >> > > >> Cheers,
> > >> > > >> Damian
> > >> > > >>
> > >> > > >> On Wed, 31 May 2017 at 20:09 Kyle Winkelman <
> > >> winkelman.k...@gmail.com
> > >> > >
> > >> > > >> wrote:
> > >> > > >>
> > >> > > >>> Hello all,
> > >> > > >>>
> > >> > > >>> I have spent some more time on this and the best alternative I
> > >> have
> > >> > > come
> > >> > > >> up
> > >> > > >>> with is:
> > >> > > >>> KGroupedStream has a single cogroup call that takes an
> > initializer
> > >> > and
> > >> > > an
> > >> > > >>> aggregator.
> > >> > > >>> CogroupedKStream has a cogroup call that takes additional
> > >> > groupedStream
> > >> > > >>> aggregator pairs.
> > >> > > >>> CogroupedKStream has multiple aggregate methods that create
> the
> > >> > > different
> > >> > > >>> stores.
> > >> > > >>>
> > >> > > >>> I plan on upd

[jira] [Comment Edited] (KAFKA-1955) Explore disk-based buffering in new Kafka Producer

2017-06-08 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043494#comment-16043494
 ] 

Jay Kreps edited comment on KAFKA-1955 at 6/8/17 9:43 PM:
--

I think the patch I submitted was kind of a cool hack, but after thinking about 
it I wasn't convinced it was really what you actually want.

Here are the considerations I thought we should probably think through:
1. How will recovery work? The patch I gave didn't have a mechanism to recover 
from a failure. I don't think this is really good enough. It means that it is 
okay if Kafka goes down, or if the app goes down, but not both. This helps but 
seems like not really what you want. But to properly handle app failure isn't 
that easy. For example, in the case of a OS crash the OS gives very weak 
guarantees on what is on disk for any data that hasn't been fsync'd. Not only 
can arbitrary bits of data be missing but it is even possible with some FS 
configurations to get arbitrary corrupt blocks that haven't been zero'd yet. I 
think to get this right you need a commit log and recovery procedure that 
verifies unsync'd data on startup. I'm not 100% sure you can do this with just 
the buffer pool, though maybe you can.
2. What are the ordering guarantees for buffered data?
3. How does this interact with transactions/EOS?
4. Should it be the case that all writes go through the commit log or should it 
be the case that only failures are journaled. If you journal all writes prior 
to sending to the server, the problem is that that amounts to significant 
overhead and leads to the possibility that logging or other I/O can slow you 
down. If you journal only failures you have the problem that your throughput 
may be very high in the non-failure scenario and then when Kafka goes down 
suddenly you start doing I/O but that is much slower and your throughput drops 
precipitously. Either may be okay but it is worth thinking through what the 
right behavior is.


was (Author: jkreps):
I think the patch I submitted was kind of a cool hack, but after thinking about 
it I wasn't convinced it was really what you actually want.

Here are the considerations I thought we should probably think through:
1. How will recovery work? The patch I gave didn't have a mechanism to recover 
from a failure. In the case of a OS crash the OS gives very weak guarantees on 
what is on disk for any data that hasn't been fsync'd. Not only can arbitrary 
bits of data be missing but it is even possible with some FS configurations to 
get arbitrary corrupt blocks that haven't been zero'd yet. I think to get this 
right you need a commit log and recovery proceedure that verifies unsync'd data 
on startup. I'm not 100% sure you can do this with just the buffer pool, though 
maybe you can.
2. What are the ordering guarantees for buffered data?
3. How does this interact with transactions/EOS?
4. Should it be the case that all writes go through the commit log or should it 
be the case that only failures are journaled. If you journal all writes prior 
to sending to the server, the problem is that that amounts to significant 
overhead and leads to the possibility that logging or other I/O can slow you 
down. If you journal only failures you have the problem that your throughput 
may be very high in the non-failure scenario and then when Kafka goes down 
suddenly you start doing I/O but that is much slower and your throughput drops 
precipitously. Either may be okay but it is worth thinking through what the 
right behavior is.

> Explore disk-based buffering in new Kafka Producer
> --
>
> Key: KAFKA-1955
> URL: https://issues.apache.org/jira/browse/KAFKA-1955
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Attachments: KAFKA-1955.patch, 
> KAFKA-1955-RABASED-TO-8th-AUG-2015.patch
>
>
> There are two approaches to using Kafka for capturing event data that has no 
> other "source of truth store":
> 1. Just write to Kafka and try hard to keep the Kafka cluster up as you would 
> a database.
> 2. Write to some kind of local disk store and copy from that to Kafka.
> The cons of the second approach are the following:
> 1. You end up depending on disks on all the producer machines. If you have 
> 1 producers, that is 10k places state is kept. These tend to fail a lot.
> 2. You can get data arbitrarily delayed
> 3. You still don't tolerate hard outages since there is no replication in the 
> producer tier
> 4. This tends to make problems with duplicates more common in certain failure 
> scenarios.
> There is one big pro, though: you don't have to keep Kafka running all the 
> time.
> So far we have done nothing in Kafka to help support 

[jira] [Comment Edited] (KAFKA-1955) Explore disk-based buffering in new Kafka Producer

2017-06-08 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043494#comment-16043494
 ] 

Jay Kreps edited comment on KAFKA-1955 at 6/8/17 9:41 PM:
--

I think the patch I submitted was kind of a cool hack, but after thinking about 
it I wasn't convinced it was really what you actually want.

Here are the considerations I thought we should probably think through:
1. How will recovery work? The patch I gave didn't have a mechanism to recover 
from a failure. In the case of a OS crash the OS gives very weak guarantees on 
what is on disk for any data that hasn't been fsync'd. Not only can arbitrary 
bits of data be missing but it is even possible with some FS configurations to 
get arbitrary corrupt blocks that haven't been zero'd yet. I think to get this 
right you need a commit log and recovery proceedure that verifies unsync'd data 
on startup. I'm not 100% sure you can do this with just the buffer pool, though 
maybe you can.
2. What are the ordering guarantees for buffered data?
3. How does this interact with transactions/EOS?
4. Should it be the case that all writes go through the commit log or should it 
be the case that only failures are journaled. If you journal all writes prior 
to sending to the server, the problem is that that amounts to significant 
overhead and leads to the possibility that logging or other I/O can slow you 
down. If you journal only failures you have the problem that your throughput 
may be very high in the non-failure scenario and then when Kafka goes down 
suddenly you start doing I/O but that is much slower and your throughput drops 
precipitously. Either may be okay but it is worth thinking through what the 
right behavior is.


was (Author: jkreps):
I think the patch I submitted was kind of a cool hack, but after thinking about 
it I don't think it is really what you want.

Here are the considerations I thought we should probably think through:
1. How will recovery work? The patch I gave didn't have a mechanism to recover 
from a failure. In the case of a OS crash the OS gives very weak guarantees on 
what is on disk for any data that hasn't been fsync'd. Not only can arbitrary 
bits of data be missing but it is even possible with some FS configurations to 
get arbitrary corrupt blocks that haven't been zero'd yet. I think to get this 
right you need a commit log and recovery proceedure that verifies unsync'd data 
on startup. I'm not 100% sure you can do this with just the buffer pool, though 
maybe you can.
2. What are the ordering guarantees for buffered data?
3. How does this interact with transactions/EOS?
4. Should it be the case that all writes go through the commit log or should it 
be the case that only failures are journaled. If you journal all writes prior 
to sending to the server, the problem is that that amounts to significant 
overhead and leads to the possibility that logging or other I/O can slow you 
down. If you journal only failures you have the problem that your throughput 
may be very high in the non-failure scenario and then when Kafka goes down 
suddenly you start doing I/O but that is much slower and your throughput drops 
precipitously. Either may be okay but it is worth thinking through what the 
right behavior is.

> Explore disk-based buffering in new Kafka Producer
> --
>
> Key: KAFKA-1955
> URL: https://issues.apache.org/jira/browse/KAFKA-1955
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Attachments: KAFKA-1955.patch, 
> KAFKA-1955-RABASED-TO-8th-AUG-2015.patch
>
>
> There are two approaches to using Kafka for capturing event data that has no 
> other "source of truth store":
> 1. Just write to Kafka and try hard to keep the Kafka cluster up as you would 
> a database.
> 2. Write to some kind of local disk store and copy from that to Kafka.
> The cons of the second approach are the following:
> 1. You end up depending on disks on all the producer machines. If you have 
> 1 producers, that is 10k places state is kept. These tend to fail a lot.
> 2. You can get data arbitrarily delayed
> 3. You still don't tolerate hard outages since there is no replication in the 
> producer tier
> 4. This tends to make problems with duplicates more common in certain failure 
> scenarios.
> There is one big pro, though: you don't have to keep Kafka running all the 
> time.
> So far we have done nothing in Kafka to help support approach (2), but people 
> have built a lot of buffering things. It's not clear that this is necessarily 
> bad.
> However implementing this in the new Kafka producer might actually be quite 
> easy. Here is an idea for how to do it. Implementation of this id

[jira] [Commented] (KAFKA-1955) Explore disk-based buffering in new Kafka Producer

2017-06-08 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043494#comment-16043494
 ] 

Jay Kreps commented on KAFKA-1955:
--

I think the patch I submitted was kind of a cool hack, but after thinking about 
it I don't think it is really what you want.

Here are the considerations I thought we should probably think through:
1. How will recovery work? The patch I gave didn't have a mechanism to recover 
from a failure. In the case of a OS crash the OS gives very weak guarantees on 
what is on disk for any data that hasn't been fsync'd. Not only can arbitrary 
bits of data be missing but it is even possible with some FS configurations to 
get arbitrary corrupt blocks that haven't been zero'd yet. I think to get this 
right you need a commit log and recovery proceedure that verifies unsync'd data 
on startup. I'm not 100% sure you can do this with just the buffer pool, though 
maybe you can.
2. What are the ordering guarantees for buffered data?
3. How does this interact with transactions/EOS?
4. Should it be the case that all writes go through the commit log or should it 
be the case that only failures are journaled. If you journal all writes prior 
to sending to the server, the problem is that that amounts to significant 
overhead and leads to the possibility that logging or other I/O can slow you 
down. If you journal only failures you have the problem that your throughput 
may be very high in the non-failure scenario and then when Kafka goes down 
suddenly you start doing I/O but that is much slower and your throughput drops 
precipitously. Either may be okay but it is worth thinking through what the 
right behavior is.

> Explore disk-based buffering in new Kafka Producer
> --
>
> Key: KAFKA-1955
> URL: https://issues.apache.org/jira/browse/KAFKA-1955
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Attachments: KAFKA-1955.patch, 
> KAFKA-1955-RABASED-TO-8th-AUG-2015.patch
>
>
> There are two approaches to using Kafka for capturing event data that has no 
> other "source of truth store":
> 1. Just write to Kafka and try hard to keep the Kafka cluster up as you would 
> a database.
> 2. Write to some kind of local disk store and copy from that to Kafka.
> The cons of the second approach are the following:
> 1. You end up depending on disks on all the producer machines. If you have 
> 1 producers, that is 10k places state is kept. These tend to fail a lot.
> 2. You can get data arbitrarily delayed
> 3. You still don't tolerate hard outages since there is no replication in the 
> producer tier
> 4. This tends to make problems with duplicates more common in certain failure 
> scenarios.
> There is one big pro, though: you don't have to keep Kafka running all the 
> time.
> So far we have done nothing in Kafka to help support approach (2), but people 
> have built a lot of buffering things. It's not clear that this is necessarily 
> bad.
> However implementing this in the new Kafka producer might actually be quite 
> easy. Here is an idea for how to do it. Implementation of this idea is 
> probably pretty easy but it would require some pretty thorough testing to see 
> if it was a success.
> The new producer maintains a pool of ByteBuffer instances which it attempts 
> to recycle and uses to buffer and send messages. When unsent data is queuing 
> waiting to be sent to the cluster it is hanging out in this pool.
> One approach to implementing a disk-baked buffer would be to slightly 
> generalize this so that the buffer pool has the option to use a mmap'd file 
> backend for it's ByteBuffers. When the BufferPool was created with a 
> totalMemory setting of 1GB it would preallocate a 1GB sparse file and memory 
> map it, then chop the file into batchSize MappedByteBuffer pieces and 
> populate it's buffer with those.
> Everything else would work normally except now all the buffered data would be 
> disk backed and in cases where there was significant backlog these would 
> start to fill up and page out.
> We currently allow messages larger than batchSize and to handle these we do a 
> one-off allocation of the necessary size. We would have to disallow this when 
> running in mmap mode. However since the disk buffer will be really big this 
> should not be a significant limitation as the batch size can be pretty big.
> We would want to ensure that the pooling always gives out the most recently 
> used ByteBuffer (I think it does). This way under normal operation where 
> requests are processed quickly a given buffer would be reused many times 
> before any physical disk write activity occurred.
> Note that although this let's the producer buffer very large amo

Re: [DISCUSS] KIP-150 - Kafka-Streams Cogroup

2017-06-08 Thread Kyle Winkelman
I chose the current way so if you make multiple tables you don't need to
supply the serde and initializer multiple times. It is true that you
wouldnt need the serde if you use a statestoresupplier but I think we could
note that in the method call.

I am fine with the first option if thats what people like. Maybe Eno or
Damian can give their opinion.

I dont really like the kstreambuilder option cause I think it is kind of
hard to find unless you know it's there.

On Jun 8, 2017 1:51 PM, "Xavier Léauté"  wrote:

> Another reason for the serde not to be in the first cogroup call, is that
> the serde should not be required if you pass a StateStoreSupplier to
> aggregate()
>
> Regarding the aggregated type  I don't the why initializer should be
> favored over aggregator to define the type. In my mind separating the
> initializer into the last aggregate call clearly indicates that the
> initializer is independent of any of the aggregators or streams and that we
> don't wait for grouped1 events to initialize the co-group.
>
> On Thu, Jun 8, 2017 at 11:14 AM Guozhang Wang  wrote:
>
> > On a second thought... This is the current proposal API
> >
> >
> > ```
> >
> >  CogroupedKStream cogroup(final Initializer initializer,
> final
> > Aggregator aggregator, final Serde
> > aggValueSerde)
> >
> > ```
> >
> >
> > If we do not have the initializer in the first co-group it might be a bit
> > awkward for users to specify the aggregator that returns a typed 
> value?
> > Maybe it is still better to put these two functions in the same api?
> >
> >
> >
> > Guozhang
> >
> > On Thu, Jun 8, 2017 at 11:08 AM, Guozhang Wang 
> wrote:
> >
> > > This suggestion lgtm. I would vote for the first alternative than
> adding
> > > it to the `KStreamBuilder` though.
> > >
> > > On Thu, Jun 8, 2017 at 10:58 AM, Xavier Léauté 
> > > wrote:
> > >
> > >> I have a minor suggestion to make the API a little bit more symmetric.
> > >> I feel it would make more sense to move the initializer and serde to
> the
> > >> final aggregate statement, since the serde only applies to the state
> > >> store,
> > >> and the initializer doesn't bear any relation to the first group in
> > >> particular. It would end up looking like this:
> > >>
> > >> KTable cogrouped =
> > >> grouped1.cogroup(aggregator1)
> > >> .cogroup(grouped2, aggregator2)
> > >> .cogroup(grouped3, aggregator3)
> > >> .aggregate(initializer1, aggValueSerde, storeName1);
> > >>
> > >> Alternatively, we could move the the first cogroup() method to
> > >> KStreamBuilder, similar to how we have .merge()
> > >> and end up with an api that would be even more symmetric.
> > >>
> > >> KStreamBuilder.cogroup(grouped1, aggregator1)
> > >>   .cogroup(grouped2, aggregator2)
> > >>   .cogroup(grouped3, aggregator3)
> > >>   .aggregate(initializer1, aggValueSerde, storeName1);
> > >>
> > >> This doesn't have to be a blocker, but I thought it would make the API
> > >> just
> > >> a tad cleaner.
> > >>
> > >> On Tue, Jun 6, 2017 at 3:59 PM Guozhang Wang 
> > wrote:
> > >>
> > >> > Kyle,
> > >> >
> > >> > Thanks a lot for the updated KIP. It looks good to me.
> > >> >
> > >> >
> > >> > Guozhang
> > >> >
> > >> >
> > >> > On Fri, Jun 2, 2017 at 5:37 AM, Jim Jagielski 
> > wrote:
> > >> >
> > >> > > This makes much more sense to me. +1
> > >> > >
> > >> > > > On Jun 1, 2017, at 10:33 AM, Kyle Winkelman <
> > >> winkelman.k...@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > I have updated the KIP and my PR. Let me know what you think.
> > >> > > > To created a cogrouped stream just call cogroup on a
> > KgroupedStream
> > >> and
> > >> > > > supply the initializer, aggValueSerde, and an aggregator. Then
> > >> continue
> > >> > > > adding kgroupedstreams and aggregators. Then call one of the
> many
> > >> > > aggregate
> > >> > > > calls to create a KTable.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Kyle
> > >> > > >
> > >> > > > On Jun 1, 2017 4:03 AM, "Damian Guy" 
> > wrote:
> > >> > > >
> > >> > > >> Hi Kyle,
> > >> > > >>
> > >> > > >> Thanks for the update. I think just one initializer makes sense
> > as
> > >> it
> > >> > > >> should only be called once per key and generally it is just
> going
> > >> to
> > >> > > create
> > >> > > >> a new instance of whatever the Aggregate class is.
> > >> > > >>
> > >> > > >> Cheers,
> > >> > > >> Damian
> > >> > > >>
> > >> > > >> On Wed, 31 May 2017 at 20:09 Kyle Winkelman <
> > >> winkelman.k...@gmail.com
> > >> > >
> > >> > > >> wrote:
> > >> > > >>
> > >> > > >>> Hello all,
> > >> > > >>>
> > >> > > >>> I have spent some more time on this and the best alternative I
> > >> have
> > >> > > come
> > >> > > >> up
> > >> > > >>> with is:
> > >> > > >>> KGroupedStream has a single cogroup call that takes an
> > initializer
> > >> > and
> > >> > > an
> > >> > > >>> aggregator.
> > >> > > >>> CogroupedKStream has a cogroup call that takes additional
> > >> > groupedStream
> > >> > >

Re: Kafka-2170 & Kafka-1194

2017-06-08 Thread Colin McCabe
I think this is an area that needs some more work.  I don't think we'll
have a fix for it in 0.11.

best,
Colin

On Thu, Jun 1, 2017, at 05:51, Jacob Braaten wrote:
> Hello,
> 
> I am emailing to check on the status of the two bugs above. Both pertain
> to
> the same issue of not being able to delete old log segments on a Windows
> machine.
> 
> I am just curious if you know of a time table for when these would be
> patched.
> 
> KAFKA-1194 
> KAFKA-2170 
> 
> Thank you,
> Jake


[jira] [Commented] (KAFKA-3199) LoginManager should allow using an existing Subject

2017-06-08 Thread Ji sun (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043484#comment-16043484
 ] 

Ji sun commented on KAFKA-3199:
---

Hi Adam, do you mind me taking this jira for Kafka 0.10?

> LoginManager should allow using an existing Subject
> ---
>
> Key: KAFKA-3199
> URL: https://issues.apache.org/jira/browse/KAFKA-3199
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Adam Kunicki
>Assignee: Adam Kunicki
>Priority: Critical
>
> LoginManager currently creates a new Login in the constructor which then 
> performs a login and starts a ticket renewal thread. The problem here is that 
> because Kafka performs its own login, it doesn't offer the ability to re-use 
> an existing subject that's already managed by the client application.
> The goal of LoginManager appears to be to be able to return a valid Subject. 
> It would be a simple fix to have LoginManager.acquireLoginManager() check for 
> a new config e.g. kerberos.use.existing.subject. 
> This would instead of creating a new Login in the constructor simply call 
> Subject.getSubject(AccessController.getContext()); to use the already logged 
> in Subject.
> This is also doable without introducing a new configuration and simply 
> checking if there is already a valid Subject available, but I think it may be 
> preferable to require that users explicitly request this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5402) JmxReporter Fetch metrics for kafka.server should not be created when client quotas are not enabled

2017-06-08 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043480#comment-16043480
 ] 

Jun Rao commented on KAFKA-5402:


Not sure if this is the same as KAFKA-3980. Currently, if a metric is not being 
actively updated (after the client is gone), the metric is supposed to be 
automatically removed after 1 hour.

Also, it seem that currently even if quota is not enabled, we still create the 
quota metric with the client-id. Not sure how useful those metrics are w/o 
enabling quota. Perhaps we should only create the metric if quota is enabled. 
[~rsivaram], what do you think?

> JmxReporter Fetch metrics for kafka.server should not be created when client 
> quotas are not enabled
> ---
>
> Key: KAFKA-5402
> URL: https://issues.apache.org/jira/browse/KAFKA-5402
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.1
>Reporter: Koelli Mungee
> Attachments: Fetch.jpg, Metrics.jpg
>
>
> JMXReporter kafka.server Fetch metrics should not be created when client 
> quotas are not enforced for client fetch requests. Currently, these metrics 
> are created and this can cause OutOfMemoryException in the KafkaServer in 
> cases where a large number of consumers are being created rapidly.
> Attaching screenshots from a heapdump showing the 
> kafka.server:type=Fetch,client-id=consumer-358567 with different client.ids 
> from a kafkaserver where client quotas were not enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5336) ListGroup requires Describe on Cluster, but the command-line AclCommand tool does not allow this to be set

2017-06-08 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe updated KAFKA-5336:
---
Summary: ListGroup requires Describe on Cluster, but the command-line 
AclCommand tool does not allow this to be set  (was: The required ACL 
permission for ListGroup is invalid)

> ListGroup requires Describe on Cluster, but the command-line AclCommand tool 
> does not allow this to be set
> --
>
> Key: KAFKA-5336
> URL: https://issues.apache.org/jira/browse/KAFKA-5336
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.10.2.1
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Minor
> Fix For: 0.11.0.0
>
>
> The {{ListGroup}} API authorizes requests with _Describe_ access to the 
> cluster resource:
> {code}
>   def handleListGroupsRequest(request: RequestChannel.Request) {
> if (!authorize(request.session, Describe, Resource.ClusterResource)) {
>   sendResponseMaybeThrottle(request, requestThrottleMs =>
> ListGroupsResponse.fromError(requestThrottleMs, 
> Errors.CLUSTER_AUTHORIZATION_FAILED))
> } else {
>   ...
> {code}
>  However, the list of operations (or permissions) allowed for the cluster 
> resource does not include _Describe_:
> {code}
>   val ResourceTypeToValidOperations = Map[ResourceType, Set[Operation]] (
> ...
> Cluster -> Set(Create, ClusterAction, DescribeConfigs, AlterConfigs, 
> IdempotentWrite, All),
> ...
>   )
> {code}
> Only a user with _All_ cluster permission can successfully call the 
> {{ListGroup}} API. No other permission (not even any combination that does 
> not include _All_) would let user use this API.
> The bug could be as simple as a typo in the API handler. Though it's not 
> obvious what actual permission was meant to be used there (perhaps 
> _DescribeConfigs_?)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka-site issue #60: Update delivery semantics section for KIP-98

2017-06-08 Thread hachikuji
Github user hachikuji commented on the issue:

https://github.com/apache/kafka-site/pull/60
  
cc @guozhangwang @apurva @ijuma 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-5336) The required ACL permission for ListGroup is invalid

2017-06-08 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe resolved KAFKA-5336.

   Resolution: Duplicate
Fix Version/s: 0.11.0.0

KAFKA-5292 in 0.11 added {{Describe}} as a valid operation on {{Cluster}}

> The required ACL permission for ListGroup is invalid
> 
>
> Key: KAFKA-5336
> URL: https://issues.apache.org/jira/browse/KAFKA-5336
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.10.2.1
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Minor
> Fix For: 0.11.0.0
>
>
> The {{ListGroup}} API authorizes requests with _Describe_ access to the 
> cluster resource:
> {code}
>   def handleListGroupsRequest(request: RequestChannel.Request) {
> if (!authorize(request.session, Describe, Resource.ClusterResource)) {
>   sendResponseMaybeThrottle(request, requestThrottleMs =>
> ListGroupsResponse.fromError(requestThrottleMs, 
> Errors.CLUSTER_AUTHORIZATION_FAILED))
> } else {
>   ...
> {code}
>  However, the list of operations (or permissions) allowed for the cluster 
> resource does not include _Describe_:
> {code}
>   val ResourceTypeToValidOperations = Map[ResourceType, Set[Operation]] (
> ...
> Cluster -> Set(Create, ClusterAction, DescribeConfigs, AlterConfigs, 
> IdempotentWrite, All),
> ...
>   )
> {code}
> Only a user with _All_ cluster permission can successfully call the 
> {{ListGroup}} API. No other permission (not even any combination that does 
> not include _All_) would let user use this API.
> The bug could be as simple as a typo in the API handler. Though it's not 
> obvious what actual permission was meant to be used there (perhaps 
> _DescribeConfigs_?)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka-site pull request #60: Update delivery semantics section for KIP-98

2017-06-08 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka-site/pull/60

Update delivery semantics section for KIP-98



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka-site update-delivery-semantics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka-site/pull/60.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #60


commit 0759f3a22e377e12e857eb6da4977adb62261d30
Author: Jason Gustafson 
Date:   2017-06-08T21:28:13Z

Update delivery semantics section for KIP-98




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3925) Default log.dir=/tmp/kafka-logs is unsafe

2017-06-08 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043472#comment-16043472
 ] 

Ismael Juma commented on KAFKA-3925:


Unit tests set the log.dir explicitly so they would not be affected by a change 
in the default.

> Default log.dir=/tmp/kafka-logs is unsafe
> -
>
> Key: KAFKA-3925
> URL: https://issues.apache.org/jira/browse/KAFKA-3925
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 0.10.0.0
> Environment: Various, depends on OS and configuration
>Reporter: Peter Davis
>
> Many operating systems are configured to delete files under /tmp.  For 
> example Ubuntu has 
> [tmpreaper|http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html], 
> others use tmpfs, others delete /tmp on startup. 
> Defaults are OK to make getting started easier but should not be unsafe 
> (risking data loss). 
> Something under /var would be a better default log.dir under *nix.  Or 
> relative to the Kafka bin directory to avoid needing root.  
> If the default cannot be changed, I would suggest a special warning print to 
> the console on broker startup if log.dir is under /tmp. 
> See [users list 
> thread|http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e].
>   I've also been bitten by this. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-3925) Default log.dir=/tmp/kafka-logs is unsafe

2017-06-08 Thread Colin P. McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043469#comment-16043469
 ] 

Colin P. McCabe commented on KAFKA-3925:


The issue with {{/var}} is that if you run Kafka as an ordinary user, you 
cannot write to this directory.  Perhaps we could write to 
{{$HOME/kafka-logs}}?  Although that would imply a different location for 
different users.

I also think a lot of unit tests rely on the {{/tmp}} location.

> Default log.dir=/tmp/kafka-logs is unsafe
> -
>
> Key: KAFKA-3925
> URL: https://issues.apache.org/jira/browse/KAFKA-3925
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 0.10.0.0
> Environment: Various, depends on OS and configuration
>Reporter: Peter Davis
>
> Many operating systems are configured to delete files under /tmp.  For 
> example Ubuntu has 
> [tmpreaper|http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html], 
> others use tmpfs, others delete /tmp on startup. 
> Defaults are OK to make getting started easier but should not be unsafe 
> (risking data loss). 
> Something under /var would be a better default log.dir under *nix.  Or 
> relative to the Kafka bin directory to avoid needing root.  
> If the default cannot be changed, I would suggest a special warning print to 
> the console on broker startup if log.dir is under /tmp. 
> See [users list 
> thread|http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e].
>   I've also been bitten by this. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : kafka-0.11.0-jdk7 #124

2017-06-08 Thread Apache Jenkins Server
See 




Build failed in Jenkins: kafka-trunk-jdk7 #2372

2017-06-08 Thread Apache Jenkins Server
See 


Changes:

[jason] HOTFIX: for flaky Streams EOS integration tests

--
[...truncated 2.40 MB...]
org.apache.kafka.streams.kstream.internals.KStreamKStreamLeftJoinTest > 
testLeftJoin STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamLeftJoinTest > 
testLeftJoin PASSED

org.apache.kafka.streams.kstream.internals.KStreamKStreamLeftJoinTest > 
testWindowing STARTED

org.apache.kafka.streams.kstream.internals.KStreamKStreamLeftJoinTest > 
testWindowing PASSED

org.apache.kafka.streams.kstream.internals.KTableForeachTest > testForeach 
STARTED

org.apache.kafka.streams.kstream.internals.KTableForeachTest > testForeach 
PASSED

org.apache.kafka.streams.kstream.internals.KTableForeachTest > testTypeVariance 
STARTED

org.apache.kafka.streams.kstream.internals.KTableForeachTest > testTypeVariance 
PASSED

org.apache.kafka.streams.kstream.internals.KStreamPrintTest > 
testPrintKeyValueWithName STARTED

org.apache.kafka.streams.kstream.internals.KStreamPrintTest > 
testPrintKeyValueWithName PASSED

org.apache.kafka.streams.kstream.internals.GlobalKTableJoinsTest > 
shouldInnerJoinWithStream STARTED

org.apache.kafka.streams.kstream.internals.GlobalKTableJoinsTest > 
shouldInnerJoinWithStream PASSED

org.apache.kafka.streams.kstream.internals.GlobalKTableJoinsTest > 
shouldLeftJoinWithStream STARTED

org.apache.kafka.streams.kstream.internals.GlobalKTableJoinsTest > 
shouldLeftJoinWithStream PASSED

org.apache.kafka.streams.kstream.internals.KStreamMapValuesTest > 
testFlatMapValues STARTED

org.apache.kafka.streams.kstream.internals.KStreamMapValuesTest > 
testFlatMapValues PASSED

org.apache.kafka.streams.kstream.internals.KStreamFlatMapTest > testFlatMap 
STARTED

org.apache.kafka.streams.kstream.internals.KStreamFlatMapTest > testFlatMap 
PASSED

org.apache.kafka.streams.kstream.internals.WindowedStreamPartitionerTest > 
testCopartitioning STARTED

org.apache.kafka.streams.kstream.internals.WindowedStreamPartitionerTest > 
testCopartitioning PASSED

org.apache.kafka.streams.kstream.internals.WindowedStreamPartitionerTest > 
testWindowedSerializerNoArgConstructors STARTED

org.apache.kafka.streams.kstream.internals.WindowedStreamPartitionerTest > 
testWindowedSerializerNoArgConstructors PASSED

org.apache.kafka.streams.kstream.internals.WindowedStreamPartitionerTest > 
testWindowedDeserializerNoArgConstructors STARTED

org.apache.kafka.streams.kstream.internals.WindowedStreamPartitionerTest > 
testWindowedDeserializerNoArgConstructors PASSED

org.apache.kafka.streams.kstream.internals.KStreamKTableLeftJoinTest > testJoin 
STARTED

org.apache.kafka.streams.kstream.internals.KStreamKTableLeftJoinTest > testJoin 
PASSED

org.apache.kafka.streams.kstream.internals.UnlimitedWindowTest > 
shouldAlwaysOverlap STARTED

org.apache.kafka.streams.kstream.internals.UnlimitedWindowTest > 
shouldAlwaysOverlap PASSED

org.apache.kafka.streams.kstream.internals.UnlimitedWindowTest > 
cannotCompareUnlimitedWindowWithDifferentWindowType STARTED

org.apache.kafka.streams.kstream.internals.UnlimitedWindowTest > 
cannotCompareUnlimitedWindowWithDifferentWindowType PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullMapperOnMapValues STARTED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullMapperOnMapValues PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullPredicateOnFilter STARTED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullPredicateOnFilter PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullOtherTableOnJoin STARTED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullOtherTableOnJoin PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullSelectorOnGroupBy STARTED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullSelectorOnGroupBy PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullJoinerOnLeftJoin STARTED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullJoinerOnLeftJoin PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > testStateStore 
STARTED

org.apache.kafka.streams.kstream.internals.KTableImplTest > testStateStore 
PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowEmptyFilePathOnWriteAsText STARTED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowEmptyFilePathOnWriteAsText PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullOtherTableOnLeftJoin STARTED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullOtherTableOnLeftJoin PASSED

org.apache.kafka.streams.kstream.internals.KTableImplTest > 
shouldNotAllowNullSelectorOnToStrea

[jira] [Commented] (KAFKA-1955) Explore disk-based buffering in new Kafka Producer

2017-06-08 Thread Colin P. McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043458#comment-16043458
 ] 

Colin P. McCabe commented on KAFKA-1955:


As Jay wrote, there are some potential problems with the disk-based buffering 
approach:

{quote}
The cons of the second approach are the following:
1. You end up depending on disks on all the producer machines. If you have 
1 producers, that is 10k places state is kept. These tend to fail a lot.
2. You can get data arbitrarily delayed
3. You still don't tolerate hard outages since there is no replication in the 
producer tier
4. This tends to make problems with duplicates more common in certain failure 
scenarios.
{quote}

Do we have potential solutions for these?

bq. I believe a malloc/free implementation over `MappedByteBuffer` will be the 
best choice. This will allow the producer buffers to use a file like a piece of 
memory at the cost of maintaining a more complex free list.

How do you plan on ensuring that the messages are written to disk in a timely 
fashion?  It seems possible that you could lose quite a lot of data if you lose 
power before the memory-mapped regions are written back to disk.  Also, a 
malloc implementation is quite a lot of complexity-- are we sure it's worth it?

If we are going to do this, we'd probably want to start with something like an 
append-only log that on which we call {{fsync}} periodically.  Also, we would 
need a KIP...

> Explore disk-based buffering in new Kafka Producer
> --
>
> Key: KAFKA-1955
> URL: https://issues.apache.org/jira/browse/KAFKA-1955
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Attachments: KAFKA-1955.patch, 
> KAFKA-1955-RABASED-TO-8th-AUG-2015.patch
>
>
> There are two approaches to using Kafka for capturing event data that has no 
> other "source of truth store":
> 1. Just write to Kafka and try hard to keep the Kafka cluster up as you would 
> a database.
> 2. Write to some kind of local disk store and copy from that to Kafka.
> The cons of the second approach are the following:
> 1. You end up depending on disks on all the producer machines. If you have 
> 1 producers, that is 10k places state is kept. These tend to fail a lot.
> 2. You can get data arbitrarily delayed
> 3. You still don't tolerate hard outages since there is no replication in the 
> producer tier
> 4. This tends to make problems with duplicates more common in certain failure 
> scenarios.
> There is one big pro, though: you don't have to keep Kafka running all the 
> time.
> So far we have done nothing in Kafka to help support approach (2), but people 
> have built a lot of buffering things. It's not clear that this is necessarily 
> bad.
> However implementing this in the new Kafka producer might actually be quite 
> easy. Here is an idea for how to do it. Implementation of this idea is 
> probably pretty easy but it would require some pretty thorough testing to see 
> if it was a success.
> The new producer maintains a pool of ByteBuffer instances which it attempts 
> to recycle and uses to buffer and send messages. When unsent data is queuing 
> waiting to be sent to the cluster it is hanging out in this pool.
> One approach to implementing a disk-baked buffer would be to slightly 
> generalize this so that the buffer pool has the option to use a mmap'd file 
> backend for it's ByteBuffers. When the BufferPool was created with a 
> totalMemory setting of 1GB it would preallocate a 1GB sparse file and memory 
> map it, then chop the file into batchSize MappedByteBuffer pieces and 
> populate it's buffer with those.
> Everything else would work normally except now all the buffered data would be 
> disk backed and in cases where there was significant backlog these would 
> start to fill up and page out.
> We currently allow messages larger than batchSize and to handle these we do a 
> one-off allocation of the necessary size. We would have to disallow this when 
> running in mmap mode. However since the disk buffer will be really big this 
> should not be a significant limitation as the batch size can be pretty big.
> We would want to ensure that the pooling always gives out the most recently 
> used ByteBuffer (I think it does). This way under normal operation where 
> requests are processed quickly a given buffer would be reused many times 
> before any physical disk write activity occurred.
> Note that although this let's the producer buffer very large amounts of data 
> the buffer isn't really fault-tolerant, since the ordering in the file isn't 
> known so there is no easy way to recovery the producer's buffer in a failure. 
> So the scope of this feature would just be to

[jira] [Commented] (KAFKA-5362) Add EOS system tests for Streams API

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043446#comment-16043446
 ] 

ASF GitHub Bot commented on KAFKA-5362:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3201


> Add EOS system tests for Streams API
> 
>
> Key: KAFKA-5362
> URL: https://issues.apache.org/jira/browse/KAFKA-5362
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, system tests
>Affects Versions: 0.11.0.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> We need to add more system tests for Streams API with exactly-once enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-5362) Add EOS system tests for Streams API

2017-06-08 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-5362.
--
Resolution: Fixed

Issue resolved by pull request 3201
[https://github.com/apache/kafka/pull/3201]

> Add EOS system tests for Streams API
> 
>
> Key: KAFKA-5362
> URL: https://issues.apache.org/jira/browse/KAFKA-5362
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, system tests
>Affects Versions: 0.11.0.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> We need to add more system tests for Streams API with exactly-once enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3201: KAFKA-5362: Add EOS system tests for Streams API

2017-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3201


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk8 #1679

2017-06-08 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] KAFKA-5357 follow-up: Yammer metrics, not Kafka Metrics

--
[...truncated 4.58 MB...]

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldCount PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldGroupByKey STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldGroupByKey PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldCountWithInternalStore STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldCountWithInternalStore PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldReduceWindowed STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldReduceWindowed PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldCountSessionWindows STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldCountSessionWindows PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldAggregateWindowed STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldAggregateWindowed PASSED

org.apache.kafka.streams.integration.FanoutIntegrationTest > 
shouldFanoutTheInput STARTED

org.apache.kafka.streams.integration.FanoutIntegrationTest > 
shouldFanoutTheInput PASSED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToRunWithEosEnabled STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToRunWithEosEnabled PASSED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToCommitToMultiplePartitions STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToCommitToMultiplePartitions PASSED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldNotViolateEosIfOneTaskFails STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldNotViolateEosIfOneTaskFails FAILED
java.lang.AssertionError: Condition not met within timeout 6. Should 
receive uncaught exception from one StreamThread.
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:274)
at 
org.apache.kafka.streams.integration.EosIntegrationTest.shouldNotViolateEosIfOneTaskFails(EosIntegrationTest.java:406)

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToPerformMultipleTransactions STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToPerformMultipleTransactions PASSED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToCommitMultiplePartitionOffsets STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToCommitMultiplePartitionOffsets PASSED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldNotViolateEosIfOneTaskFailsWithState STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldNotViolateEosIfOneTaskFailsWithState PASSED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToRestartAfterClose STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToRestartAfterClose PASSED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableLeftJoin STARTED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableLeftJoin PASSED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableJoin STARTED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableJoin PASSED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldCompactTopicsForStateChangelogs STARTED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldCompactTopicsForStateChangelogs PASSED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldUseCompactAndDeleteForWindowStoreChangelogs STARTED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldUseCompactAndDeleteForWindowStoreChangelogs PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > testGetConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > shouldAcceptAtLeastOnce STARTED

org.apache.kafka.streams.StreamsConfigTest > shouldAcceptAtLeastOnce PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerEnableIdempotenceIfEosDisabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerEnableIdempotenceIfEosDisabled PASSED

org.apache.kafka.streams.StreamsConfigTest > defaultSerdeShouldBeConfigured 
STARTED

org.apache.kafka.streams.StreamsConfigTest > defaultSerdeShouldBeC

Re: Pull-Requests not worked on

2017-06-08 Thread Colin McCabe
So, there has been some instability in the junit tests recently due to
some big features landing.  This is starting to improve a lot, though,
so hopefully it won't be a problem for much longer.

If you do hit what you think is a bogus unit test failure, you can type
"retest this please" in the PR to trigger another junit run.

It would be good to close some of the stale PRs.  I have some myself
that I should probably close since they have been superseded by
different approaches.

With regard to OSGi specifically, it might be hard to find someone who
has the relevant expertise to review the patch.  Since our understanding
is limited, maybe it makes sense to have  someone else who understands
OSGi repackage the software for that system?  I'm just tossing out ideas
here, though-- I could be wrong.

best,
Colin


On Tue, May 23, 2017, at 09:48, Jeff Widman wrote:
> I agree with this. As a new contributor, it was a bit demoralizing to
> look
> at all the open PR's and wonder whether when I sent a patch it would just
> be left to sit in the ether.
> 
> In other projects I'm involved with, more typically the maintainers go
> through periodically and close old PR's that will never be merged. I know
> at this point it's an intimidating amount of work, but I still think it'd
> be useful to cut down this backlog.
> 
> Maybe at the SF Kafka summit sprint have a group that does this? It's a
> decent task for n00bs who want to help but don't know where to start to
> ask
> them to help identify PR's that are ancient and should be closed as they
> will never be merged.
> 
> On Tue, May 23, 2017 at 4:59 AM,  wrote:
> 
> > Hello everyone
> >
> > I am wondering how pull-requests are handled for Kafka? There is currently
> > a huge amount of PRs on Github and most of them are not getting any
> > attention.
> >
> > If the maintainers only have a look at PR which passed the CI (which makes
> > sense due to the amount), then there is a problem, because the CI-pipeline
> > is not stable. I've submitted a PR myself which adds OSGi-metadata to the
> > kafka-clients artifact (see 2882). The pipeline fails randomly even though
> > the change only adds some entries to the manifest.
> > The next issue I have is, that people submitting PRs cannot have a look at
> > the failing CI job. So with regards to my PR, I dont have a clue what went
> > wrong.
> > If I am missing something in the process please let me know.
> > Regarding PR 2882, please consider merging because it would safe the
> > osgi-community the effort of wrapping the kafka artifact and deploy it
> > with different coordinates on maven central (which can confuse users)
> > regards
> > Marc
> >


[jira] [Issue Comment Deleted] (KAFKA-1044) change log4j to slf4j

2017-06-08 Thread Viktor Somogyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi updated KAFKA-1044:
--
Comment: was deleted

(was: @ewencp, thank you for your help.
I've started this task and all in all my conclusion is that we should delay 
this until the release of slf4j 1.8 (alpha2 is already out), as there are some 
log4j features that are being used but not part of slf4j, such as the 
LogManager in Log4jController or the Fatal log level.
It is a bit hard to work around these effectively and without breaking 
behaviour. Best I know is we could use error level instead of fatal for the 
time being or we could separate off Log4jController into its own module and try 
to load it in runtime if log4j is being used to get rid of the hard log4j 
dependency, but these are quite hacky and I'm not sure if it's worth the effort 
now as we have a workaround for the original issue. Please let me know what you 
think.)

> change log4j to slf4j 
> --
>
> Key: KAFKA-1044
> URL: https://issues.apache.org/jira/browse/KAFKA-1044
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.8.0
>Reporter: shijinkui
>Assignee: Viktor Somogyi
>Priority: Minor
>  Labels: newbie
>
> can u chanage the log4j to slf4j, in my project, i use logback, it's conflict 
> with log4j.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3272: MINOR: disable flaky Streams EOS integration tests

2017-06-08 Thread mjsax
Github user mjsax closed the pull request at:

https://github.com/apache/kafka/pull/3272


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3273: HOTFIX: for flaky Streams EOS integration tests

2017-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3273


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-0.11.0-jdk7 #123

2017-06-08 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] KAFKA-5357 follow-up: Yammer metrics, not Kafka Metrics

--
[...truncated 2.38 MB...]

org.apache.kafka.streams.integration.KStreamsFineGrainedAutoResetIntegrationTest
 > 
shouldOnlyReadRecordsWhereEarliestSpecifiedWithNoCommittedOffsetsWithDefaultGlobalAutoOffsetResetEarliest
 PASSED

org.apache.kafka.streams.integration.KStreamsFineGrainedAutoResetIntegrationTest
 > shouldThrowStreamsExceptionNoResetSpecified STARTED

org.apache.kafka.streams.integration.KStreamsFineGrainedAutoResetIntegrationTest
 > shouldThrowStreamsExceptionNoResetSpecified PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenDeleted STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenDeleted PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testNoMessagesSentExceptionFromOverlappingPatterns STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testNoMessagesSentExceptionFromOverlappingPatterns PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
shouldAddStateStoreToRegexDefinedSource STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
shouldAddStateStoreToRegexDefinedSource PASSED

org.apache.kafka.streams.integration.KStreamRepartitionJoinTest > 
shouldCorrectlyRepartitionOnJoinOperationsWithZeroSizedCache STARTED

org.apache.kafka.streams.integration.KStreamRepartitionJoinTest > 
shouldCorrectlyRepartitionOnJoinOperationsWithZeroSizedCache PASSED

org.apache.kafka.streams.integration.KStreamRepartitionJoinTest > 
shouldCorrectlyRepartitionOnJoinOperationsWithNonZeroSizedCache STARTED

org.apache.kafka.streams.integration.KStreamRepartitionJoinTest > 
shouldCorrectlyRepartitionOnJoinOperationsWithNonZeroSizedCache PASSED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldCompactTopicsForStateChangelogs STARTED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldCompactTopicsForStateChangelogs PASSED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldUseCompactAndDeleteForWindowStoreChangelogs STARTED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldUseCompactAndDeleteForWindowStoreChangelogs PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldReduce STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldReduce PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldAggregate STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldAggregate PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldCount STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldCount PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldGroupByKey STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldGroupByKey PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldCountWithInternalStore STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldCountWithInternalStore PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldReduceWindowed STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldReduceWindowed PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldCountSessionWindows STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldCountSessionWindows PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldAggregateWindowed STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldAggregateWindowed P

[jira] [Commented] (KAFKA-1044) change log4j to slf4j

2017-06-08 Thread Viktor Somogyi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043329#comment-16043329
 ] 

Viktor Somogyi commented on KAFKA-1044:
---

@ewencp, thank you for your help.
I've started this task and all in all my conclusion is that we should delay 
this until the release of slf4j 1.8 (alpha2 is already out), as there are some 
log4j features that are being used but not part of slf4j, such as the 
LogManager in Log4jController or the Fatal log level.
It is a bit hard to work around these effectively and without breaking 
behaviour. Best I know is we could use error level instead of fatal for the 
time being or we could separate off Log4jController into its own module and try 
to load it in runtime if log4j is being used to get rid of the hard log4j 
dependency, but these are quite hacky and I'm not sure if it's worth the effort 
now as we have a workaround for the original issue. Please let me know what you 
think.

> change log4j to slf4j 
> --
>
> Key: KAFKA-1044
> URL: https://issues.apache.org/jira/browse/KAFKA-1044
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.8.0
>Reporter: shijinkui
>Assignee: Viktor Somogyi
>Priority: Minor
>  Labels: newbie
>
> can u chanage the log4j to slf4j, in my project, i use logback, it's conflict 
> with log4j.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5409) Providing a custom client-id to the ConsoleProducer tool

2017-06-08 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043268#comment-16043268
 ] 

Bharat Viswanadham commented on KAFKA-5409:
---

Hi Paolo,
Ya missed that point.
Ya I think to use client.id sent by user, we need to have an option --client-id 
or 
update the code to call producerProps after setting default props in 
getNewProducerProps

> Providing a custom client-id to the ConsoleProducer tool
> 
>
> Key: KAFKA-5409
> URL: https://issues.apache.org/jira/browse/KAFKA-5409
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Paolo Patierno
>Priority: Minor
>
> Hi,
> I see that the client-id properties for the ConsoleProducer tool is always 
> "console-producer". It could be useful having it as parameter on the command 
> line or generating a random one like happens for the ConsolerConsumer.
> If it makes sense to you, I can work on that.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : kafka-trunk-jdk8 #1678

2017-06-08 Thread Apache Jenkins Server
See 




[jira] [Commented] (KAFKA-5357) StackOverFlow error in transaction coordinator

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043233#comment-16043233
 ] 

ASF GitHub Bot commented on KAFKA-5357:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3242


> StackOverFlow error in transaction coordinator
> --
>
> Key: KAFKA-5357
> URL: https://issues.apache.org/jira/browse/KAFKA-5357
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Damian Guy
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: KAFKA-5357.tar.gz
>
>
> I observed the following in the broker logs: 
> {noformat}
> [2017-06-01 04:10:36,664] ERROR [Replica Manager on Broker 1]: Error 
> processing append operation on partition __transaction_state-37 
> (kafka.server.ReplicaManager)
> [2017-06-01 04:10:36,667] ERROR [TxnMarkerSenderThread-1]: Error due to 
> (kafka.common.InterBrokerSendThread)
> java.lang.StackOverflowError
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.PrintWriter.(PrintWriter.java:116)
> at java.io.PrintWriter.(PrintWriter.java:100)
> at 
> org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:58)
> at 
> org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)
> at 
> org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)
> at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:313)
> at 
> org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:369)
> at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
> at 
> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
> at 
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
> at org.apache.log4j.Category.callAppenders(Category.java:206)
> at org.apache.log4j.Category.forcedLog(Category.java:391)
> at org.apache.log4j.Category.error(Category.java:322)
> at kafka.utils.Logging$class.error(Logging.scala:105)
> at kafka.server.ReplicaManager.error(ReplicaManager.scala:122)
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:557)
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:505)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:505)
> at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:346)
> at 
> kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1.apply$mcV$sp(TransactionStateManager.scala:589)
> at 
> kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1.apply(TransactionStateManager.scala:570)
> at 
> kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1.apply(TransactionStateManager.scala:570)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
> at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:219)
> at 
> kafka.coordinator.transaction.TransactionStateManager.appendTransactionToLog(TransactionStateManager.scala:564)
> at 
> kafka.coordinator.transaction.TransactionMarkerChannelManager.kafka$coordinator$transaction$TransactionMarkerChannelManager$$retryAppendCallback$1(TransactionMarkerChannelManager.scala:225)
> at 
> kafka.coordinator.transaction.TransactionMarkerChannelManager$$anonfun$kafka$coordinator$transaction$TransactionMarkerChannelManager$$retryAppendCallback$1$4.apply(TransactionMarkerChannelManager.scala:225)
> at 
> kafka.coordinator.transaction.TransactionMarkerChannelManager$$anonfun$kafka$coordinator$transaction$TransactionMarkerChannelManager$$retryAppendCallback$1$4.apply(TransactionMarkerChannelManager.scala:225)
> at 
> kafka.coordinator.transaction.TransactionStateManager.kafka$coordinator$transaction$TransactionStateManager$$updateCacheCallback$1(TransactionStateManager.scala:561)
> at 
> kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1$$anonfun$apply$mcV$sp$4

[GitHub] kafka pull request #3242: KAFKA-5357 follow-up: Yammer metrics, not Kafka Me...

2017-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3242


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-150 - Kafka-Streams Cogroup

2017-06-08 Thread Xavier Léauté
Another reason for the serde not to be in the first cogroup call, is that
the serde should not be required if you pass a StateStoreSupplier to
aggregate()

Regarding the aggregated type  I don't the why initializer should be
favored over aggregator to define the type. In my mind separating the
initializer into the last aggregate call clearly indicates that the
initializer is independent of any of the aggregators or streams and that we
don't wait for grouped1 events to initialize the co-group.

On Thu, Jun 8, 2017 at 11:14 AM Guozhang Wang  wrote:

> On a second thought... This is the current proposal API
>
>
> ```
>
>  CogroupedKStream cogroup(final Initializer initializer, final
> Aggregator aggregator, final Serde
> aggValueSerde)
>
> ```
>
>
> If we do not have the initializer in the first co-group it might be a bit
> awkward for users to specify the aggregator that returns a typed  value?
> Maybe it is still better to put these two functions in the same api?
>
>
>
> Guozhang
>
> On Thu, Jun 8, 2017 at 11:08 AM, Guozhang Wang  wrote:
>
> > This suggestion lgtm. I would vote for the first alternative than adding
> > it to the `KStreamBuilder` though.
> >
> > On Thu, Jun 8, 2017 at 10:58 AM, Xavier Léauté 
> > wrote:
> >
> >> I have a minor suggestion to make the API a little bit more symmetric.
> >> I feel it would make more sense to move the initializer and serde to the
> >> final aggregate statement, since the serde only applies to the state
> >> store,
> >> and the initializer doesn't bear any relation to the first group in
> >> particular. It would end up looking like this:
> >>
> >> KTable cogrouped =
> >> grouped1.cogroup(aggregator1)
> >> .cogroup(grouped2, aggregator2)
> >> .cogroup(grouped3, aggregator3)
> >> .aggregate(initializer1, aggValueSerde, storeName1);
> >>
> >> Alternatively, we could move the the first cogroup() method to
> >> KStreamBuilder, similar to how we have .merge()
> >> and end up with an api that would be even more symmetric.
> >>
> >> KStreamBuilder.cogroup(grouped1, aggregator1)
> >>   .cogroup(grouped2, aggregator2)
> >>   .cogroup(grouped3, aggregator3)
> >>   .aggregate(initializer1, aggValueSerde, storeName1);
> >>
> >> This doesn't have to be a blocker, but I thought it would make the API
> >> just
> >> a tad cleaner.
> >>
> >> On Tue, Jun 6, 2017 at 3:59 PM Guozhang Wang 
> wrote:
> >>
> >> > Kyle,
> >> >
> >> > Thanks a lot for the updated KIP. It looks good to me.
> >> >
> >> >
> >> > Guozhang
> >> >
> >> >
> >> > On Fri, Jun 2, 2017 at 5:37 AM, Jim Jagielski 
> wrote:
> >> >
> >> > > This makes much more sense to me. +1
> >> > >
> >> > > > On Jun 1, 2017, at 10:33 AM, Kyle Winkelman <
> >> winkelman.k...@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > > I have updated the KIP and my PR. Let me know what you think.
> >> > > > To created a cogrouped stream just call cogroup on a
> KgroupedStream
> >> and
> >> > > > supply the initializer, aggValueSerde, and an aggregator. Then
> >> continue
> >> > > > adding kgroupedstreams and aggregators. Then call one of the many
> >> > > aggregate
> >> > > > calls to create a KTable.
> >> > > >
> >> > > > Thanks,
> >> > > > Kyle
> >> > > >
> >> > > > On Jun 1, 2017 4:03 AM, "Damian Guy" 
> wrote:
> >> > > >
> >> > > >> Hi Kyle,
> >> > > >>
> >> > > >> Thanks for the update. I think just one initializer makes sense
> as
> >> it
> >> > > >> should only be called once per key and generally it is just going
> >> to
> >> > > create
> >> > > >> a new instance of whatever the Aggregate class is.
> >> > > >>
> >> > > >> Cheers,
> >> > > >> Damian
> >> > > >>
> >> > > >> On Wed, 31 May 2017 at 20:09 Kyle Winkelman <
> >> winkelman.k...@gmail.com
> >> > >
> >> > > >> wrote:
> >> > > >>
> >> > > >>> Hello all,
> >> > > >>>
> >> > > >>> I have spent some more time on this and the best alternative I
> >> have
> >> > > come
> >> > > >> up
> >> > > >>> with is:
> >> > > >>> KGroupedStream has a single cogroup call that takes an
> initializer
> >> > and
> >> > > an
> >> > > >>> aggregator.
> >> > > >>> CogroupedKStream has a cogroup call that takes additional
> >> > groupedStream
> >> > > >>> aggregator pairs.
> >> > > >>> CogroupedKStream has multiple aggregate methods that create the
> >> > > different
> >> > > >>> stores.
> >> > > >>>
> >> > > >>> I plan on updating the kip but I want people's input on if we
> >> should
> >> > > have
> >> > > >>> the initializer be passed in once at the beginning or if we
> should
> >> > > >> instead
> >> > > >>> have the initializer be required for each call to one of the
> >> > aggregate
> >> > > >>> calls. The first makes more sense to me but doesnt allow the
> user
> >> to
> >> > > >>> specify different initializers for different tables.
> >> > > >>>
> >> > > >>> Thanks,
> >> > > >>> Kyle
> >> > > >>>
> >> > > >>> On May 24, 2017 7:46 PM, "Kyle Winkelman" <
> >> winkelman.k...@gmail.com>
> >> > > >>> wrote:
> >> > > >>>
> >> > > >

Re: [DISCUSS] KIP-164 Add unavailablePartitionCount and per-partition Unavailable metrics

2017-06-08 Thread Dong Lin
Hey Ismael,

Thanks for the feedback! Could you please vote for the KIP if it looks
good? Then I will find two more committers to vote as well.

Thanks,
Dong

On Tue, May 30, 2017 at 9:08 AM, Dong Lin  wrote:

> Thanks Edoardo and everyone for the comment! That is a very good point. I
> have updated to wiki to use UnderMinIsrPartitionCount as the per-broker
> metric name and UnderMinIsr as the per-partition metric name. The
> motivation section has also been updated to clarify how the existence of
> UnderMinIsrPartition reduces the availability of the Kafka service.
>
> Please find the latest wiki at https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-164-+Add+UnderMinIsrPartitionCount+and+
> per-partition+UnderMinIsr+metrics .
>
>
>
> On Tue, May 30, 2017 at 5:35 AM, Ismael Juma  wrote:
>
>> Thanks for the KIP, Dong. I agree that that the metrics are useful. Like
>> Edoardo and Mickael said, it seems like it may be better to choose a
>> different name. A couple of additional suggestions:
>> `UnderMinIsrPartitionCount` and `UnderMinIsr`.
>>
>> Ismael
>>
>> On Tue, May 30, 2017 at 12:04 PM, Mickael Maison <
>> mickael.mai...@gmail.com>
>> wrote:
>>
>> > What about simply calling them 'BelowIsrPartitionCount' and 'BelowIsr' ?
>> >
>> > On Tue, May 30, 2017 at 11:40 AM, Edoardo Comar 
>> wrote:
>> > > Hi Dong,
>> > >
>> > > many thanks for the KIP. It's a very useful metric.
>> > >
>> > > by saying
>> > >> Unavailable partitions could be most easily defined as “The number of
>> > > partitions that this broker leads for which the ISR is insufficient to
>> > > meet the minimum ISR required.”
>> > >
>> > > I presume you meant to call 'Unavailable' the partitions whose
>> ISR.size <
>> > > min.insync  ?
>> > >
>> > > Now, a partition whose ISR is < min.insync can be still used to
>> consume
>> > > messages from. It also can be used to produce messages to, as long as
>> the
>> > > producer does not request acks=-1 (i.e. acks=all).
>> > >
>> > > So it is not exactly 'Unavailable' ... perhaps we could call it
>> 'Unsafe'
>> > ?
>> > > Or the community can come up with a better name.
>> > >
>> > > I recently had a few discussions about the issue, and I opened a PR to
>> > > update the docs (that's still hoping to be reviewed and merged ...
>> hint
>> > > hint :-)
>> > > https://github.com/apache/kafka/pull/3035
>> > > https://issues.apache.org/jira/browse/KAFKA-5290
>> > >
>> > > Thanks!
>> > > Edo
>> > > --
>> > > Edoardo Comar
>> > > IBM Message Hub
>> > > eco...@uk.ibm.com
>> > > IBM UK Ltd, Hursley Park, SO21 2JN
>> > >
>> > >
>> > >
>> > >
>> > > From:   Mickael Maison 
>> > > To: dev@kafka.apache.org
>> > > Date:   30/05/2017 10:51
>> > > Subject:Re: [DISCUSS] KIP-164 Add unavailablePartitionCount
>> and
>> > > per-partition Unavailable metrics
>> > >
>> > >
>> > >
>> > > +1
>> > > It's a mystery how this didn't already exist as it's one of the key
>> > > cluster's health indicator
>> > >
>> > > On Mon, May 29, 2017 at 9:18 AM, Gwen Shapira 
>> wrote:
>> > >> Hi,
>> > >>
>> > >> Sounds good. I was sure this existed already for some reason :)
>> > >>
>> > >> On Sun, May 28, 2017 at 11:06 AM Dong Lin 
>> wrote:
>> > >>
>> > >>> Hi,
>> > >>>
>> > >>> We created KIP-164 to propose adding per-partition metric
>> *Unavailable*
>> > > and
>> > >>> per-broker metric *UnavailablePartitionCount*
>> > >>>
>> > >>> The KIP wik can be found at
>> > >>>
>> > >>>
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-164-+Add+
>> > unavailablePartitionCount+and+per-partition+Unavailable+metrics
>> > >
>> > >>> .
>> > >>>
>> > >>> Comments are welcome.
>> > >>>
>> > >>> Thanks,
>> > >>> Dong
>> > >>>
>> > >
>> > >
>> > >
>> > >
>> > > Unless stated otherwise above:
>> > > IBM United Kingdom Limited - Registered in England and Wales with
>> number
>> > > 741598.
>> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
>> > 3AU
>> > >
>> >
>>
>
>


Re: [VOTE] KIP-164 Add unavailablePartitionCount and per-partition Unavailable metrics

2017-06-08 Thread Dong Lin
Thanks to everyone who voted! I am waiting for committers to vote before
closing this thread.

On Mon, Jun 5, 2017 at 6:10 AM, Bill Bejeck  wrote:

> Thanks for the KIP.
>
> +1
>
> -Bill
>
> On Mon, Jun 5, 2017 at 8:51 AM, Edoardo Comar  wrote:
>
> > +1 (non binding)
> >
> > Dong, thanks for the KIP :-)
> > --
> > Edoardo Comar
> > IBM Message Hub
> > eco...@uk.ibm.com
> > IBM UK Ltd, Hursley Park, SO21 2JN
> >
> >
> >
> >
> > From:Michal Borowiecki 
> > To:dev@kafka.apache.org
> > Date:02/06/2017 10:25
> > Subject:Re: [VOTE] KIP-164 Add unavailablePartitionCount and
> > per-partition Unavailable metrics
> > --
> >
> >
> >
> > +1 (non binding)
> >
> > Thanks,
> > Michał
> >
> > On 02/06/17 10:18, Mickael Maison wrote:
> > +1 (non binding)
> > Thanks for the KIP
> >
> > On Thu, Jun 1, 2017 at 5:44 PM, Dong Lin **
> >  wrote:
> >
> > Hi all,
> >
> > Can you please vote for KIP-164? The KIP can be found at
> >
> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-164-+
> Add+UnderMinIsrPartitionCount+and+per-partition+UnderMinIsr+metrics*
> >  Add+UnderMinIsrPartitionCount+and+per-partition+UnderMinIsr+metrics>
> > .
> >
> > Thanks,
> > Dong
> >
> >
> > --
> >  *Michal Borowiecki*
> > *Senior Software Engineer L4*
> > *T: * +44 208 742 1600 <(208)%20742-1600>
> > +44 203 249 8448 <(203)%20249-8448>
> >
> > *E: * *michal.borowie...@openbet.com* 
> > *W: * *www.openbet.com* 
> > *OpenBet Ltd*
> > Chiswick Park Building 9
> > 566 Chiswick High Rd
> > London
> > W4 5XT
> > UK
> > 
> > This message is confidential and intended only for the addressee. If you
> > have received this message in error, please immediately notify the
> > *postmas...@openbet.com*  and delete it from
> your
> > system as well as any copies. The content of e-mails as well as traffic
> > data may be monitored by OpenBet for employment and security purposes. To
> > protect the environment please do not print this e-mail unless necessary.
> > OpenBet Ltd. Registered Office: Chiswick Park Building 9, 566 Chiswick
> High
> > Road, London, W4 5XT, United Kingdom. A company registered in England and
> > Wales. Registered no. 3134634. VAT no. GB927523612
> >
> >
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> >
> >
>


  1   2   >