[jira] [Created] (KAFKA-5378) Last Stable Offset not returned in Fetch request

2017-06-03 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-5378:
--

 Summary: Last Stable Offset not returned in Fetch request
 Key: KAFKA-5378
 URL: https://issues.apache.org/jira/browse/KAFKA-5378
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jason Gustafson
Assignee: Jason Gustafson
 Fix For: 0.11.0.0


Looks like we didn't update KafkaApis to return the last stable offset in the 
fetch response. The consumer doesn't use it for anything at the moment, but it 
would still be good to fix for debugging purposes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5376) Transactions: Concurrent transactional consumer reads aborted messages

2017-06-03 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036170#comment-16036170
 ] 

Jason Gustafson commented on KAFKA-5376:


[~apurva] Please post the log data from one of the failing test runs.

> Transactions: Concurrent transactional consumer reads aborted messages
> --
>
> Key: KAFKA-5376
> URL: https://issues.apache.org/jira/browse/KAFKA-5376
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Jason Gustafson
>Priority: Blocker
>  Labels: exactly-once
>
> This may be a dup of KAFKA-5355, but the system tests in KAFKA-5366 shows 
> that a concurrent transactional consumer reads aborted messages. For the test 
> in question the clients are bounced 6 times. With a transaction size of 500, 
> we expect 3000 aborted messages. The concurrent consumer regularly over 
> counts by 1000 to 1500 messages, suggesting that some aborted transactions 
> are consumed. 
> {noformat}
> 
> test_id:
> kafkatest.tests.core.transactions_test.TransactionsTest.test_transactions.failure_mode=clean_bounce.bounce_target=clients
> status: FAIL
> run time:   1 minute 56.102 seconds
> Detected 1000 dups in concurrently consumed messages
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", 
> line 123, in run
> data = self.run_test()
>   File 
> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", 
> line 176, in run_test
> return self.test_context.function(self.test)
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py", line 
> 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File "/opt/kafka-dev/tests/kafkatest/tests/core/transactions_test.py", line 
> 235, in test_transactions
> assert num_dups_in_concurrent_consumer == 0, "Detected %d dups in 
> concurrently consumed messages" % num_dups_in_concurrent_consumer
> AssertionError: Detected 1000 dups in concurrently consumed messages
> {noformat}
> This behavior continues even after https://github.com/apache/kafka/pull/3221 
> was merged. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5368) Kafka Streams skipped-records-rate sensor produces nonzero values when the timestamps are valid

2017-06-03 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036169#comment-16036169
 ] 

Matthias J. Sax commented on KAFKA-5368:


I actually think, we should add an integration test here. We can use a custom 
TS extractor to return -1 timestamps in a controlled manner.

> Kafka Streams skipped-records-rate sensor produces nonzero values when the 
> timestamps are valid
> ---
>
> Key: KAFKA-5368
> URL: https://issues.apache.org/jira/browse/KAFKA-5368
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Hamidreza Afzali
>Assignee: Hamidreza Afzali
> Fix For: 0.11.0.0
>
>
> Kafka Streams skipped-records-rate sensor produces nonzero values even when 
> the timestamps are valid and records are processed. The values are equal to 
> poll-rate.
> Related issue: KAFKA-5055 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3232: MINOR: update docs with regard to KIP-123

2017-06-03 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/3232

MINOR: update docs with regard to KIP-123



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka minor-update-docs-for-kip-123

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3232.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3232


commit 1c1c0ccc32f8e5f1a425f3b49f3ccba1a546d778
Author: Matthias J. Sax 
Date:   2017-06-04T03:31:28Z

MINOR: update docs with regard to KIP-123




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-06-03 Thread Matthias J. Sax
What I don't understand is this:

> From there on its the easiest way forward: fix, redeploy, start => done 

If you have many producers that work fine and a new "bad" producer
starts up and writes bad data into your input topic, your Streams app
dies but all your producers, including the bad one, keep writing.

Thus, how would you fix this, as you cannot "remove" the corrupted date
from the topic? It might take some time to identify the root cause and
stop the bad producer. Up to this point you get good and bad data into
your Streams input topic. If Streams app in not able to skip over those
bad records, how would you get all the good data from the topic? Not
saying it's not possible, but it's extra work copying the data with a
new non-Streams consumer-producer-app into a new topic and than feed
your Streams app from this new topic -- you also need to update all your
upstream producers to write to the new topic.

Thus, if you want to fail fast, you can still do this. And after you
detected and fixed the bad producer you might just reconfigure your app
to skip bad records until it reaches the good part of the data.
Afterwards, you could redeploy with fail-fast again.


Thus, for this pattern, I actually don't see any reason why to stop the
Streams app at all. If you have a callback, and use the callback to
raise an alert (and maybe get the bad data into a bad record queue), it
will not take longer to identify and stop the "bad" producer. But for
this case, you have zero downtime for your Streams app.

This seems to be much simpler. Or do I miss anything?


Having said this, I agree that the "threshold based callback" might be
questionable. But as you argue for strict "fail-fast", I want to argue
that this must not always be the best pattern to apply and that the
overall KIP idea is super useful from my point of view.


-Matthias


On 6/3/17 11:57 AM, Jan Filipiak wrote:
> Could not agree more!
> 
> But then I think the easiest is still: print exception and die.
> From there on its the easiest way forward: fix, redeploy, start => done
> 
> All the other ways to recover a pipeline that was processing partially
> all the time
> and suddenly went over a "I cant take it anymore" threshold is not
> straight forward IMO.
> 
> How to find the offset, when it became to bad when it is not the latest
> commited one?
> How to reset there? with some reasonable stuff in your rockses?
> 
> If one would do the following. The continuing Handler would measure for
> a threshold and
> would terminate after a certain threshold has passed (per task). Then
> one can use offset commit/ flush intervals
> to make reasonable assumption of how much is slipping by + you get an
> easy recovery when it gets to bad
> + you could also account for "in processing" records.
> 
> Setting this threshold to zero would cover all cases with 1
> implementation. It is still beneficial to have it pluggable
> 
> Again CRC-Errors are the only bad pills we saw in production for now.
> 
> Best Jan
> 
> 
> On 02.06.2017 17:37, Jay Kreps wrote:
>> Jan, I agree with you philosophically. I think one practical challenge
>> has
>> to do with data formats. Many people use untyped events, so there is
>> simply
>> no guarantee on the form of the input. E.g. many companies use JSON
>> without
>> any kind of schema so it becomes very hard to assert anything about the
>> input which makes these programs very fragile to the "one accidental
>> message publication that creates an unsolvable problem.
>>
>> For that reason I do wonder if limiting to just serialization actually
>> gets
>> you a useful solution. For JSON it will help with the problem of
>> non-parseable JSON, but sounds like it won't help in the case where the
>> JSON is well-formed but does not have any of the fields you expect and
>> depend on for your processing. I expect the reason for limiting the scope
>> is it is pretty hard to reason about correctness for anything that
>> stops in
>> the middle of processing an operator DAG?
>>
>> -Jay
>>
>> On Fri, Jun 2, 2017 at 4:50 AM, Jan Filipiak 
>> wrote:
>>
>>> IMHO your doing it wrong then. + building to much support into the kafka
>>> eco system is very counterproductive in fostering a happy userbase
>>>
>>>
>>>
>>> On 02.06.2017 13:15, Damian Guy wrote:
>>>
 Jan, you have a choice to Fail fast if you want. This is about giving
 people options and there are times when you don't want to fail fast.


 On Fri, 2 Jun 2017 at 11:00 Jan Filipiak 
 wrote:

 Hi
> 1.
> That greatly complicates monitoring.  Fail Fast gives you that when
> you
> monitor only the lag of all your apps
> you are completely covered. With that sort of new application
> Monitoring
> is very much more complicated as
> you know need to monitor fail % of some special apps aswell. In my
> opinion that is a huge downside already.
>
> 2.
> using a schema regerstry like Avrostuff it might not even be the
> reco

[GitHub] kafka pull request #3231: KAFKA-5364: ensurePartitionAdded does not handle p...

2017-06-03 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3231

KAFKA-5364: ensurePartitionAdded does not handle pending partitions in 
abortable error state



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-5364

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3231.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3231


commit 7e65cf7e4642e8dca29a02321e29e4701a1712ba
Author: Jason Gustafson 
Date:   2017-06-04T01:26:51Z

KAFKA-5364: ensurePartitionAdded does not handle pending partitions in 
abortable error state




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-5364) Producer attempts to send transactional messages before adding partitions to transaction

2017-06-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036140#comment-16036140
 ] 

ASF GitHub Bot commented on KAFKA-5364:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3231

KAFKA-5364: ensurePartitionAdded does not handle pending partitions in 
abortable error state



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-5364

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3231.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3231


commit 7e65cf7e4642e8dca29a02321e29e4701a1712ba
Author: Jason Gustafson 
Date:   2017-06-04T01:26:51Z

KAFKA-5364: ensurePartitionAdded does not handle pending partitions in 
abortable error state




> Producer attempts to send transactional messages before adding partitions to 
> transaction
> 
>
> Key: KAFKA-5364
> URL: https://issues.apache.org/jira/browse/KAFKA-5364
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: KAFKA-5364.tar.gz
>
>
> Due to a race condition between the sender thread and the producer.send(), 
> the following is possible: 
> # In KakfaProducer.doSend(), we add partitions to the transaction and then do 
> accumulator.append. 
> # In Sender.run(), we check whether there are transactional request. If there 
> are, we send them and wait for the response. 
> # If there aren't we drain the accumulator queue and send the produce 
> requests.
> # The problem is that the sequence step 2, 1, 3 is entire possible. This 
> means that we won't send the 'AddPartitions' request but yet try to send the 
> produce data. Which results in a fatal error and requires the producer to 
> close. 
> The solution is that in the accumulator.drain, we should check again if there 
> are pending add partitions requests, and if so, don't drain anything.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Sink Processor definition

2017-06-03 Thread Matthias J. Sax
I think "sink" it the correct term here. It means that we write to a topic.

Processors, that don't have downstream nodes are called "terminal
operation" (at least in the DSL). Thus, a sink is also a "terminal
operation" but not the other way round.

So the docs are not optimal, as they put the "terminal" part into the
focus while the "writing to a topic" part is the main thing here.


-Matthias

On 6/3/17 1:52 AM, Michal Borowiecki wrote:
> Yes, I think the key distinction, from the point of view of that
> documentation section, is that it doesn't have downstream processors.
> 
> 
> On 03/06/17 09:48, Damian Guy wrote:
>> Hi Michal,
>>
>> In this case Sink Processor is really referring to a SinkNode that can
>> only produce to a kafka topic. Maybe the terminology is incorrect as
>> strictly speaking a processor that writes data to anything could be
>> considered a Sink Processor.
>>
>> On Sat, 3 Jun 2017 at 09:23 Michal Borowiecki
>> mailto:michal.borowie...@openbet.com>>
>> wrote:
>>
>> Hi all,
>>
>> Streams docs say:
>>
>>>   * *Sink Processor*: A sink processor is a special type of
>>> stream processor that does not have down-stream processors.
>>> It sends any received records from its up-stream processors
>>> to a specified Kafka topic.
>>>
>> Would a processor that doesn't produce to a kafka topic (directly)
>> but only updates a state store also be considered a sink
>> processor? I think yes.
>>
>> I'll submit a PR to that effect unless I hear otherwise.
>>
>> Cheers,
>>
>> Michał
>>
>> -- 
>> Michal Borowiecki
>> Senior Software Engineer L4
>>  T:  +44 208 742 1600 
>>
>>  
>>  +44 203 249 8448 
>>
>>  
>>   
>>  E:  michal.borowie...@openbet.com
>> 
>>  W:  www.openbet.com 
>>
>>  
>>  OpenBet Ltd
>>
>>  Chiswick Park Building 9
>>
>>  566 Chiswick High Rd
>>
>>  London
>>
>>  W4 5XT
>>
>>  UK
>>
>>  
>> 
>>
>> This message is confidential and intended only for the addressee.
>> If you have received this message in error, please immediately
>> notify the postmas...@openbet.com 
>> and delete it from your system as well as any copies. The content
>> of e-mails as well as traffic data may be monitored by OpenBet for
>> employment and security purposes. To protect the environment
>> please do not print this e-mail unless necessary. OpenBet Ltd.
>> Registered Office: Chiswick Park Building 9, 566 Chiswick High
>> Road, London, W4 5XT, United Kingdom. A company registered in
>> England and Wales. Registered no. 3134634. VAT no. GB927523612
>>
> 
> -- 
> Signature
>  Michal Borowiecki
> Senior Software Engineer L4
>   T:  +44 208 742 1600
> 
>   
>   +44 203 249 8448
> 
>   
>
>   E:  michal.borowie...@openbet.com
>   W:  www.openbet.com 
> 
>   
>   OpenBet Ltd
> 
>   Chiswick Park Building 9
> 
>   566 Chiswick High Rd
> 
>   London
> 
>   W4 5XT
> 
>   UK
> 
>   
> 
> 
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmas...@openbet.com  and delete it
> from your system as well as any copies. The content of e-mails as well
> as traffic data may be monitored by OpenBet for employment and security
> purposes. To protect the environment please do not print this e-mail
> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
> registered in England and Wales. Registered no. 3134634. VAT no.
> GB927523612
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Commented] (KAFKA-4325) Improve processing of late records for window operations

2017-06-03 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036135#comment-16036135
 ] 

Matthias J. Sax commented on KAFKA-4325:


I assume yes. \cc [~guozhang]

> Improve processing of late records for window operations
> 
>
> Key: KAFKA-4325
> URL: https://issues.apache.org/jira/browse/KAFKA-4325
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Minor
>
> Windows are kept until their retention time passed. If a late arriving record 
> is processed that is older than any window kept, a new window is created 
> containing this single late arriving record, the aggregation is computed and 
> the window is immediately discarded afterward (as it is older than retention 
> time).
> This behavior might case problems for downstream application as the original 
> window aggregate might we overwritten with the late single-record- aggregate 
> value. Thus, we should rather not process the late arriving record for this 
> case.
> However, data loss might not be acceptable for all use cases. In order to 
> enable the use to not lose any data, window operators should allow to 
> register a handler function that is called instead of just dropping the late 
> arriving record.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5070) org.apache.kafka.streams.errors.LockException: task [0_18] Failed to lock the state directory: /opt/rocksdb/pulse10/0_18

2017-06-03 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036134#comment-16036134
 ] 

Matthias J. Sax commented on KAFKA-5070:


It's unclear to me how increasing poll time (ie, {{poll.ms}}) could resolves 
the issue. Or do you refer to {{max.poll.interval.ms}} -- this would make 
sense. Btw, the default value for it in {{0.10.2.1}} is infinite (ie, 
{{Integer.MAX_VALUE}}). Thus, {{0.10.2.1}} does fix the issue for you but you 
"invert" the fix if you set non-default value. Could this be the case?

> org.apache.kafka.streams.errors.LockException: task [0_18] Failed to lock the 
> state directory: /opt/rocksdb/pulse10/0_18
> 
>
> Key: KAFKA-5070
> URL: https://issues.apache.org/jira/browse/KAFKA-5070
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
> Environment: Linux Version
>Reporter: Dhana
>Assignee: Matthias J. Sax
> Attachments: RocksDB_LockStateDirec.7z
>
>
> Notes: we run two instance of consumer in two difference machines/nodes.
> we have 400 partitions. 200  stream threads/consumer, with 2 consumer.
> We perform HA test(on rebalance - shutdown of one of the consumer/broker), we 
> see this happening
> Error:
> 2017-04-05 11:36:09.352 WARN  StreamThread:1184 StreamThread-66 - Could not 
> create task 0_115. Will retry.
> org.apache.kafka.streams.errors.LockException: task [0_115] Failed to lock 
> the state directory: /opt/rocksdb/pulse10/0_115
>   at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:102)
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:834)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1207)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1180)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:937)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$500(StreamThread.java:69)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:236)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:255)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:339)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1030)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:582)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5377) Kafka server process crashing due to access violation (caused by log cleaner)

2017-06-03 Thread Markus B (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus B updated KAFKA-5377:

Description: 
We are running Kafka in a 2 x broker cluster configuration on Windows, and 
overall it has been working well for us. We have been seeing occasional issues 
where the broker crashes first on one node, and then almost immediately on the 
second. When we go and try to re-start, the broker continues to crash during 
startup until we fix the issue that caused the crash.

I finally figured out that the root cause of the startup crashes were a bad set 
of files in __consumer_offsets-2 (in this latest case, which offset is the 
cause varies). Once I deleted the bad files, the broker started up correctly 
again.

>From what I can tell, looking at both code, crash dump files, and log files, 
>it is all happening because of the log cleaner, and I can pinpoint it down in 
>most (if not all) cases to TimeIndex. The java dump file indicates some kind 
>of an access violation, but I am not sure when/how that is happening. It seems 
>like the initial crashes happen during the compacting/swapping action, and 
>then the startups fail when they try to access the bad files 
>(TimeIndex.parse()).

I am attaching dump files from two separate instances of when it initially 
crashed, and then when we try to restart. Also including the broker config 
settings that we are using.

I'm not sure what additional information to provide, but I can add more if 
needed.

Any help, suggestions or input would be very appreciated.

  was:
We are running Kafka in a 2 x broker cluster configuration on Windows, and 
overall it has been working well for us. We have been seeing occasional issues 
where the broker crashes first on one node, and then almost immediately on the 
second. When we go an try to re-start, the broker continues to crash during 
startup.

I finally figured out that the cause of the startup not working was a bad set 
of files in __consumer_offsets-2 (in this latest case). Once I deleted the bad 
files, the broker started up correctly again. 

>From what I can tell, looking at both code, crash dump files, it is all 
>happening because of the log cleaner, and I can pinpoint it down in most (if 
>not all) cases to TimeIndex. The java dump file indicates some kind of an 
>access violation, but I am not sure when/how that is happening.

I am attaching dump files from two separate instances of when it initially 
crashed, and then when we try to restart. Also including the broker config 
settings that we are using.

I'm not sure what additional information to provide, but I can add more if 
needed.

Any help, suggestions or input would be very appreciated.


> Kafka server process crashing due to access violation (caused by log cleaner)
> -
>
> Key: KAFKA-5377
> URL: https://issues.apache.org/jira/browse/KAFKA-5377
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: Windows 2008 R2, Intel Xeon CPU, 64 GB RAM
> 4 Disk Drives (C for software, D for log files, E/F for Kafka/Zookeeper data)
> 2 broker cluster
>Reporter: Markus B
>  Labels: windows
> Attachments: hs_err_pid15944.log, hs_err_pid6304.log, 
> hs_err_pid7356.log, hs_err_pid9056.log, hs_err_pid9276.log, 
> java_error7192.log, server.1.properties
>
>
> We are running Kafka in a 2 x broker cluster configuration on Windows, and 
> overall it has been working well for us. We have been seeing occasional 
> issues where the broker crashes first on one node, and then almost 
> immediately on the second. When we go and try to re-start, the broker 
> continues to crash during startup until we fix the issue that caused the 
> crash.
> I finally figured out that the root cause of the startup crashes were a bad 
> set of files in __consumer_offsets-2 (in this latest case, which offset is 
> the cause varies). Once I deleted the bad files, the broker started up 
> correctly again.
> From what I can tell, looking at both code, crash dump files, and log files, 
> it is all happening because of the log cleaner, and I can pinpoint it down in 
> most (if not all) cases to TimeIndex. The java dump file indicates some kind 
> of an access violation, but I am not sure when/how that is happening. It 
> seems like the initial crashes happen during the compacting/swapping action, 
> and then the startups fail when they try to access the bad files 
> (TimeIndex.parse()).
> I am attaching dump files from two separate instances of when it initially 
> crashed, and then when we try to restart. Also including the broker config 
> settings that we are using.
> I'm not sure what additional information to provide, but I can add more if 
> needed.
> Any help, suggestions or in

[jira] [Updated] (KAFKA-5377) Kafka server process crashing due to access violation (caused by log cleaner)

2017-06-03 Thread Markus B (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus B updated KAFKA-5377:

Summary: Kafka server process crashing due to access violation (caused by 
log cleaner)  (was: Kafka process crashing due to access violation)

> Kafka server process crashing due to access violation (caused by log cleaner)
> -
>
> Key: KAFKA-5377
> URL: https://issues.apache.org/jira/browse/KAFKA-5377
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: Windows 2008 R2, Intel Xeon CPU, 64 GB RAM
> 4 Disk Drives (C for software, D for log files, E/F for Kafka/Zookeeper data)
> 2 broker cluster
>Reporter: Markus B
>  Labels: windows
> Attachments: hs_err_pid15944.log, hs_err_pid6304.log, 
> hs_err_pid7356.log, hs_err_pid9056.log, hs_err_pid9276.log, 
> java_error7192.log, server.1.properties
>
>
> We are running Kafka in a 2 x broker cluster configuration on Windows, and 
> overall it has been working well for us. We have been seeing occasional 
> issues where the broker crashes first on one node, and then almost 
> immediately on the second. When we go an try to re-start, the broker 
> continues to crash during startup.
> I finally figured out that the cause of the startup not working was a bad set 
> of files in __consumer_offsets-2 (in this latest case). Once I deleted the 
> bad files, the broker started up correctly again. 
> From what I can tell, looking at both code, crash dump files, it is all 
> happening because of the log cleaner, and I can pinpoint it down in most (if 
> not all) cases to TimeIndex. The java dump file indicates some kind of an 
> access violation, but I am not sure when/how that is happening.
> I am attaching dump files from two separate instances of when it initially 
> crashed, and then when we try to restart. Also including the broker config 
> settings that we are using.
> I'm not sure what additional information to provide, but I can add more if 
> needed.
> Any help, suggestions or input would be very appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-165: Extend Interactive Queries for return latest update timestamp per key

2017-06-03 Thread Jeyhun Karimov
Hi Matthias,

Thanks for comments.

 - why do you only consider get() and not range() and all() ?


The corresponding jira concentrates on single key lookups. Moreover, I
could not find a use-case to include range queries to return records with
timestamp. However, theoritically we can include range() and all() as well.

 - we cannot have a second get() (this would be ambiguous) but need
> another name like getWithTs() (or something better)

 - what use case do you have in mind for getKeyTs() ? Would a single new
> method returning KeyContext not be sufficient?


Thanks for correction, this is my bad.

 - for backward compatibility, we will also need a new interface and
> cannot just extend the existing one


 I will correct the KIP accordingly.

Thanks,
Jeyhun

On Fri, Jun 2, 2017 at 7:36 AM, Matthias J. Sax 
wrote:

> Thanks for the KIP Jeyhun.
>
> Some comments:
>  - why do you only consider get() and not range() and all() ?
>  - we cannot have a second get() (this would be ambiguous) but need
> another name like getWithTs() (or something better)
>  - what use case do you have in mind for getKeyTs() ? Would a single new
> method returning KeyContext not be sufficient?
>  - for backward compatibility, we will also need a new interface and
> cannot just extend the existing one
>
>
>
> -Matthias
>
> On 5/29/17 4:55 PM, Jeyhun Karimov wrote:
> > Dear community,
> >
> > I want to share KIP-165 [1] based on issue KAFKA-4304 [2].
> > I would like to get your comments.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 165%3A+Extend+Interactive+Queries+for+return+latest+
> update+timestamp+per+key
> > [2] https://issues.apache.org/jira/browse/KAFKA-4304
> >
> > Cheers,
> > Jeyhun
> >
>
>


[jira] [Created] (KAFKA-5377) Kafka process crashing due to access violation

2017-06-03 Thread Markus Bergman (JIRA)
Markus Bergman created KAFKA-5377:
-

 Summary: Kafka process crashing due to access violation
 Key: KAFKA-5377
 URL: https://issues.apache.org/jira/browse/KAFKA-5377
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.2.1, 0.10.2.0
 Environment: Windows 2008 R2, Intel Xeon CPU, 64 GB RAM
4 Disk Drives (C for software, D for log files, E/F for Kafka/Zookeeper data)
2 broker cluster
Reporter: Markus Bergman
 Attachments: hs_err_pid15944.log, hs_err_pid6304.log, 
hs_err_pid7356.log, hs_err_pid9056.log, hs_err_pid9276.log, java_error7192.log, 
server.1.properties

We are running Kafka in a 2 x broker cluster configuration on Windows, and 
overall it has been working well for us. We have been seeing occasional issues 
where the broker crashes first on one node, and then almost immediately on the 
second. When we go an try to re-start, the broker continues to crash during 
startup.

I finally figured out that the cause of the startup not working was a bad set 
of files in __consumer_offsets-2 (in this latest case). Once I deleted the bad 
files, the broker started up correctly again. 

>From what I can tell, looking at both code, crash dump files, it is all 
>happening because of the log cleaner, and I can pinpoint it down in most (if 
>not all) cases to TimeIndex. The java dump file indicates some kind of an 
>access violation, but I am not sure when/how that is happening.

I am attaching dump files from two separate instances of when it initially 
crashed, and then when we try to restart. Also including the broker config 
settings that we are using.

I'm not sure what additional information to provide, but I can add more if 
needed.

Any help, suggestions or input would be very appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5355) Broker returns messages beyond "latest stable offset" to transactional consumer in read_committed mode

2017-06-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036112#comment-16036112
 ] 

ASF GitHub Bot commented on KAFKA-5355:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3230

KAFKA-5355: Test cases to ensure isolation level propagated in delayed fetch



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-5355-TESTS

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3230


commit ef598178633d1a2e7e87a3a1e5b4ee879d3e6f47
Author: Jason Gustafson 
Date:   2017-06-03T22:43:02Z

KAFKA-5355: Test cases to ensure isolation level propagated in delayed fetch




> Broker returns messages beyond "latest stable offset" to transactional 
> consumer in read_committed mode
> --
>
> Key: KAFKA-5355
> URL: https://issues.apache.org/jira/browse/KAFKA-5355
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.11.0.0
>Reporter: Matthias J. Sax
>Assignee: Jason Gustafson
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: test.log
>
>
> This issue is exposed by the new Streams EOS integration test.
> Streams has two tasks (ie, two producers with {{pid}} 0 and 2000) both 
> writing to output topic {{output}} with one partition (replication factor 1).
> The test uses an transactional consumer with {{group.id=readCommitted}} to 
> read the data from {{output}} topic. When it read the data, each producer has 
> committed 10 records (one producer write messages with {{key=0}} and the 
> other with {{key=1}}). Furthermore, each producer has an open transaction and 
> 5 uncommitted records written.
> The test fails, as we expect to see 10 records per key, but we get 15 for 
> key=1:
> {noformat}
> java.lang.AssertionError: 
> Expected: <[KeyValue(1, 0), KeyValue(1, 1), KeyValue(1, 3), KeyValue(1, 6), 
> KeyValue(1, 10), KeyValue(1, 15), KeyValue(1, 21), KeyValue(1, 28), 
> KeyValue(1, 36), KeyValue(1, 45)]>
>  but: was <[KeyValue(1, 0), KeyValue(1, 1), KeyValue(1, 3), KeyValue(1, 
> 6), KeyValue(1, 10), KeyValue(1, 15), KeyValue(1, 21), KeyValue(1, 28), 
> KeyValue(1, 36), KeyValue(1, 45), KeyValue(1, 55), KeyValue(1, 66), 
> KeyValue(1, 78), KeyValue(1, 91), KeyValue(1, 105)]>
> {noformat}
> Dumping the segment shows, that there are two commit markers (one for each 
> producer) for the first 10 messages written. Furthermore, there are 5 pending 
> records. Thus, "latest stable offset" should be 21 (20 messages plus 2 commit 
> markers) and not data should be returned beyond this offset.
> Dumped Log Segment {{output-0}}
> {noformat}
> Starting offset: 0
> baseOffset: 0 lastOffset: 9 baseSequence: 0 lastSequence: 9 producerId: 0 
> producerEpoch: 6 partitionLeaderEpoch: 0 isTransactional: true position: 0 
> CreateTime: 1496255947332 isvalid: true size: 291 magic: 2 compresscodec: 
> NONE crc: 600535135
> baseOffset: 10 lastOffset: 10 baseSequence: -1 lastSequence: -1 producerId: 0 
> producerEpoch: 6 partitionLeaderEpoch: 0 isTransactional: true position: 291 
> CreateTime: 1496256005429 isvalid: true size: 78 magic: 2 compresscodec: NONE 
> crc: 3458060752
> baseOffset: 11 lastOffset: 20 baseSequence: 0 lastSequence: 9 producerId: 
> 2000 producerEpoch: 2 partitionLeaderEpoch: 0 isTransactional: true position: 
> 369 CreateTime: 1496255947322 isvalid: true size: 291 magic: 2 compresscodec: 
> NONE crc: 3392915713
> baseOffset: 21 lastOffset: 25 baseSequence: 10 lastSequence: 14 producerId: 0 
> producerEpoch: 6 partitionLeaderEpoch: 0 isTransactional: true position: 660 
> CreateTime: 1496255947342 isvalid: true size: 176 magic: 2 compresscodec: 
> NONE crc: 3513911368
> baseOffset: 26 lastOffset: 26 baseSequence: -1 lastSequence: -1 producerId: 
> 2000 producerEpoch: 2 partitionLeaderEpoch: 0 isTransactional: true position: 
> 836 CreateTime: 1496256011784 isvalid: true size: 78 magic: 2 compresscodec: 
> NONE crc: 1619151485
> {noformat}
> Dump with {{--deep-iteration}}
> {noformat}
> Starting offset: 0
> offset: 0 position: 0 CreateTime: 1496255947323 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 0 
> headerKeys: [] key: 1 payload: 0
> offset: 1 position: 0 CreateTime: 1496255947324 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 1 
> headerKeys: [] key: 1 payload: 1
> off

[GitHub] kafka pull request #3230: KAFKA-5355: Test cases to ensure isolation level p...

2017-06-03 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3230

KAFKA-5355: Test cases to ensure isolation level propagated in delayed fetch



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-5355-TESTS

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3230


commit ef598178633d1a2e7e87a3a1e5b4ee879d3e6f47
Author: Jason Gustafson 
Date:   2017-06-03T22:43:02Z

KAFKA-5355: Test cases to ensure isolation level propagated in delayed fetch




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5376) Transactions: Concurrent transactional consumer reads aborted messages

2017-06-03 Thread Apurva Mehta (JIRA)
Apurva Mehta created KAFKA-5376:
---

 Summary: Transactions: Concurrent transactional consumer reads 
aborted messages
 Key: KAFKA-5376
 URL: https://issues.apache.org/jira/browse/KAFKA-5376
 Project: Kafka
  Issue Type: Bug
Reporter: Apurva Mehta
Assignee: Jason Gustafson
Priority: Blocker


This may be a dup of KAFKA-5355, but the system tests in KAFKA-5366 shows that 
a concurrent transactional consumer reads aborted messages. For the test in 
question the clients are bounced 6 times. With a transaction size of 500, we 
expect 3000 aborted messages. The concurrent consumer regularly over counts by 
1000 to 1500 messages, suggesting that some aborted transactions are consumed. 

{noformat}

test_id:
kafkatest.tests.core.transactions_test.TransactionsTest.test_transactions.failure_mode=clean_bounce.bounce_target=clients
status: FAIL
run time:   1 minute 56.102 seconds


Detected 1000 dups in concurrently consumed messages
Traceback (most recent call last):
  File 
"/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", line 
123, in run
data = self.run_test()
  File 
"/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", line 
176, in run_test
return self.test_context.function(self.test)
  File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py", line 
321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/tests/core/transactions_test.py", line 
235, in test_transactions
assert num_dups_in_concurrent_consumer == 0, "Detected %d dups in 
concurrently consumed messages" % num_dups_in_concurrent_consumer
AssertionError: Detected 1000 dups in concurrently consumed messages
{noformat}

This behavior continues even after https://github.com/apache/kafka/pull/3221 
was merged. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5375) Transactions: Concurrent transactional consumer loses messages when there are broker bounces

2017-06-03 Thread Apurva Mehta (JIRA)
Apurva Mehta created KAFKA-5375:
---

 Summary: Transactions: Concurrent transactional consumer loses 
messages when there are broker bounces
 Key: KAFKA-5375
 URL: https://issues.apache.org/jira/browse/KAFKA-5375
 Project: Kafka
  Issue Type: Bug
Reporter: Apurva Mehta
Assignee: Apurva Mehta
Priority: Blocker
 Fix For: 0.11.0.0


With the system test modifications in KAFKA-5366, the concurrent reader almost 
always consumes a fraction of the expected messages when there are broker 
bounces. A consumer running without concurrent writes consumes all the messages 
from the topic in question in the same test.

{noformat}

test_id:
kafkatest.tests.core.transactions_test.TransactionsTest.test_transactions.failure_mode=clean_bounce.bounce_target=brokers
status: FAIL
run time:   1 minute 59.169 seconds


Input and concurrently consumed output message sets are not equal. Num 
input messages: 2. Num concurrently_consumed_messages: 0
Traceback (most recent call last):
  File 
"/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", line 
123, in run
data = self.run_test()
  File 
"/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", line 
176, in run_test
return self.test_context.function(self.test)
  File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py", line 
321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/tests/core/transactions_test.py", line 
238, in test_transactions
(len(input_message_set), len(concurrently_consumed_message_set))
AssertionError: Input and concurrently consumed output message sets are not 
equal. Num input messages: 2. Num concurrently_consumed_messages: 0
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5375) Transactions: Concurrent transactional consumer loses messages when there are broker bounces

2017-06-03 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5375:

Labels: exactly-once  (was: )

> Transactions: Concurrent transactional consumer loses messages when there are 
> broker bounces
> 
>
> Key: KAFKA-5375
> URL: https://issues.apache.org/jira/browse/KAFKA-5375
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> With the system test modifications in KAFKA-5366, the concurrent reader 
> almost always consumes a fraction of the expected messages when there are 
> broker bounces. A consumer running without concurrent writes consumes all the 
> messages from the topic in question in the same test.
> {noformat}
> 
> test_id:
> kafkatest.tests.core.transactions_test.TransactionsTest.test_transactions.failure_mode=clean_bounce.bounce_target=brokers
> status: FAIL
> run time:   1 minute 59.169 seconds
> Input and concurrently consumed output message sets are not equal. Num 
> input messages: 2. Num concurrently_consumed_messages: 0
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", 
> line 123, in run
> data = self.run_test()
>   File 
> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", 
> line 176, in run_test
> return self.test_context.function(self.test)
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py", line 
> 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File "/opt/kafka-dev/tests/kafkatest/tests/core/transactions_test.py", line 
> 238, in test_transactions
> (len(input_message_set), len(concurrently_consumed_message_set))
> AssertionError: Input and concurrently consumed output message sets are not 
> equal. Num input messages: 2. Num concurrently_consumed_messages: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5364) Producer attempts to send transactional messages before adding partitions to transaction

2017-06-03 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036092#comment-16036092
 ] 

Apurva Mehta commented on KAFKA-5364:
-

I am leaving the fix version as 0.11.0.0 for now. But since PR 3202 was merged, 
this has become very rare. So the most common case looks like it is solved. The 
concurrent read tests (KAFKA-5366) fail all the time because of data 
inconsistency in the concurrent reader, so I am going to focus on those before 
getting to this, as those are more common issues.



> Producer attempts to send transactional messages before adding partitions to 
> transaction
> 
>
> Key: KAFKA-5364
> URL: https://issues.apache.org/jira/browse/KAFKA-5364
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: KAFKA-5364.tar.gz
>
>
> Due to a race condition between the sender thread and the producer.send(), 
> the following is possible: 
> # In KakfaProducer.doSend(), we add partitions to the transaction and then do 
> accumulator.append. 
> # In Sender.run(), we check whether there are transactional request. If there 
> are, we send them and wait for the response. 
> # If there aren't we drain the accumulator queue and send the produce 
> requests.
> # The problem is that the sequence step 2, 1, 3 is entire possible. This 
> means that we won't send the 'AddPartitions' request but yet try to send the 
> produce data. Which results in a fatal error and requires the producer to 
> close. 
> The solution is that in the accumulator.drain, we should check again if there 
> are pending add partitions requests, and if so, don't drain anything.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (KAFKA-5364) Producer attempts to send transactional messages before adding partitions to transaction

2017-06-03 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta reopened KAFKA-5364:
-

> Producer attempts to send transactional messages before adding partitions to 
> transaction
> 
>
> Key: KAFKA-5364
> URL: https://issues.apache.org/jira/browse/KAFKA-5364
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: KAFKA-5364.tar.gz
>
>
> Due to a race condition between the sender thread and the producer.send(), 
> the following is possible: 
> # In KakfaProducer.doSend(), we add partitions to the transaction and then do 
> accumulator.append. 
> # In Sender.run(), we check whether there are transactional request. If there 
> are, we send them and wait for the response. 
> # If there aren't we drain the accumulator queue and send the produce 
> requests.
> # The problem is that the sequence step 2, 1, 3 is entire possible. This 
> means that we won't send the 'AddPartitions' request but yet try to send the 
> produce data. Which results in a fatal error and requires the producer to 
> close. 
> The solution is that in the accumulator.drain, we should check again if there 
> are pending add partitions requests, and if so, don't drain anything.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5364) Producer attempts to send transactional messages before adding partitions to transaction

2017-06-03 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5364:

Attachment: KAFKA-5364.tar.gz

This has recurred once in more than 2 dozen runs after 3202 was merged. So the 
issue still exists, but is more rare. Reopening.

> Producer attempts to send transactional messages before adding partitions to 
> transaction
> 
>
> Key: KAFKA-5364
> URL: https://issues.apache.org/jira/browse/KAFKA-5364
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: KAFKA-5364.tar.gz
>
>
> Due to a race condition between the sender thread and the producer.send(), 
> the following is possible: 
> # In KakfaProducer.doSend(), we add partitions to the transaction and then do 
> accumulator.append. 
> # In Sender.run(), we check whether there are transactional request. If there 
> are, we send them and wait for the response. 
> # If there aren't we drain the accumulator queue and send the produce 
> requests.
> # The problem is that the sequence step 2, 1, 3 is entire possible. This 
> means that we won't send the 'AddPartitions' request but yet try to send the 
> produce data. Which results in a fatal error and requires the producer to 
> close. 
> The solution is that in the accumulator.drain, we should check again if there 
> are pending add partitions requests, and if so, don't drain anything.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5347) OutOfSequence error should be fatal

2017-06-03 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036089#comment-16036089
 ] 

Apurva Mehta commented on KAFKA-5347:
-

I think this can be punted to a bug fix release. This is a minor improvement 
which would make the producer failure modes more consistent and make the user 
experience a little better once there is an OutOfOrderSequence error (which 
should really never happen).

But I think our current priority for the 0.11.0.0 release should be getting the 
transactions system tests fully stable on streams and core, and I think we 
still have a lot of work to do there.

> OutOfSequence error should be fatal
> ---
>
> Key: KAFKA-5347
> URL: https://issues.apache.org/jira/browse/KAFKA-5347
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>Assignee: Apurva Mehta
>  Labels: exactly-once
> Fix For: 0.11.0.1
>
>
> If the producer sees an OutOfSequence error for a given partition, we 
> currently treat it as an abortable error. This makes some sense because 
> OutOfSequence won't prevent us from being able to send the EndTxn to abort 
> the transaction. The problem is that the producer, even after aborting, still 
> won't be able to send to the topic with an OutOfSequence. One way to deal 
> with this is to ask the user to call {{initTransactions()}} again to bump the 
> epoch, but this is a bit difficult to explain and could be dangerous since it 
> renders zombie checking less effective. Probably we should just consider 
> OutOfSequence fatal for the transactional producer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-113 - Support replicas movement between log directories

2017-06-03 Thread Dong Lin
Hey Joel,

Thanks much for the review! I have updated the KIP to address all your
comments. Please find the diff here

.

Regards,
Dong

On Sat, Jun 3, 2017 at 11:43 AM, Joel Koshy  wrote:

> +1
>
> Few additional comments (most of which we discussed offline):
>
>-
>
>This was summarized in the “discuss” thread, but it is worth recording
>in the KIP itself that the LEO in DescribeDirsResponse is useful to
> measure
>progress of the move.
>-
>
>num.replica.move.threads defaults to # log directories - perhaps note
>that we typically expect a 1-1 mapping to disks for this to work well.
>-
>
>Can you clarify in the KIP whether intra.broker.throttled.rate is
>per-broker or per-thread?
>-
>
>Reassignment JSON: can log_dirs be made optional? i.e., its absence
>would mean “any”
>-
>
>Can you also explicitly state somewhere that “any” translates to
>round-robin assignment today?
>
>
> On Mon, Apr 3, 2017 at 9:49 AM, Dong Lin  wrote:
>
> > Hi all,
> >
> > It seems that there is no further concern with the KIP-113. We would like
> > to start the voting process. The KIP can be found at
> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 113%3A+Support+replicas+movement+between+log+directories
> >  > 113%3A+Support+replicas+movement+between+log+directories>.*
> >
> > Thanks,
> > Dong
> >
>


[jira] [Updated] (KAFKA-5347) OutOfSequence error should be fatal

2017-06-03 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5347:

Fix Version/s: 0.11.0.1

> OutOfSequence error should be fatal
> ---
>
> Key: KAFKA-5347
> URL: https://issues.apache.org/jira/browse/KAFKA-5347
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>Assignee: Apurva Mehta
>  Labels: exactly-once
> Fix For: 0.11.0.1
>
>
> If the producer sees an OutOfSequence error for a given partition, we 
> currently treat it as an abortable error. This makes some sense because 
> OutOfSequence won't prevent us from being able to send the EndTxn to abort 
> the transaction. The problem is that the producer, even after aborting, still 
> won't be able to send to the topic with an OutOfSequence. One way to deal 
> with this is to ask the user to call {{initTransactions()}} again to bump the 
> epoch, but this is a bit difficult to explain and could be dangerous since it 
> renders zombie checking less effective. Probably we should just consider 
> OutOfSequence fatal for the transactional producer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-06-03 Thread Jan Filipiak

Could not agree more!

But then I think the easiest is still: print exception and die.
From there on its the easiest way forward: fix, redeploy, start => done

All the other ways to recover a pipeline that was processing partially 
all the time
and suddenly went over a "I cant take it anymore" threshold is not 
straight forward IMO.


How to find the offset, when it became to bad when it is not the latest 
commited one?

How to reset there? with some reasonable stuff in your rockses?

If one would do the following. The continuing Handler would measure for 
a threshold and
would terminate after a certain threshold has passed (per task). Then 
one can use offset commit/ flush intervals
to make reasonable assumption of how much is slipping by + you get an 
easy recovery when it gets to bad

+ you could also account for "in processing" records.

Setting this threshold to zero would cover all cases with 1 
implementation. It is still beneficial to have it pluggable


Again CRC-Errors are the only bad pills we saw in production for now.

Best Jan


On 02.06.2017 17:37, Jay Kreps wrote:

Jan, I agree with you philosophically. I think one practical challenge has
to do with data formats. Many people use untyped events, so there is simply
no guarantee on the form of the input. E.g. many companies use JSON without
any kind of schema so it becomes very hard to assert anything about the
input which makes these programs very fragile to the "one accidental
message publication that creates an unsolvable problem.

For that reason I do wonder if limiting to just serialization actually gets
you a useful solution. For JSON it will help with the problem of
non-parseable JSON, but sounds like it won't help in the case where the
JSON is well-formed but does not have any of the fields you expect and
depend on for your processing. I expect the reason for limiting the scope
is it is pretty hard to reason about correctness for anything that stops in
the middle of processing an operator DAG?

-Jay

On Fri, Jun 2, 2017 at 4:50 AM, Jan Filipiak 
wrote:


IMHO your doing it wrong then. + building to much support into the kafka
eco system is very counterproductive in fostering a happy userbase



On 02.06.2017 13:15, Damian Guy wrote:


Jan, you have a choice to Fail fast if you want. This is about giving
people options and there are times when you don't want to fail fast.


On Fri, 2 Jun 2017 at 11:00 Jan Filipiak 
wrote:

Hi

1.
That greatly complicates monitoring.  Fail Fast gives you that when you
monitor only the lag of all your apps
you are completely covered. With that sort of new application Monitoring
is very much more complicated as
you know need to monitor fail % of some special apps aswell. In my
opinion that is a huge downside already.

2.
using a schema regerstry like Avrostuff it might not even be the record
that is broken, it might be just your app
unable to fetch a schema it needs now know. Maybe you got partitioned
away from that registry.

3. When you get alerted because of to high fail percentage. what are the
steps you gonna do?
shut it down to buy time. fix the problem. spend way to much time to
find a good reprocess offset.
Your timewindows are in bad shape anyways, and you pretty much lost.
This routine is nonsense.

Dead letter queues would be the worst possible addition to the kafka
toolkit that I can think of. It just doesn't fit the architecture
of having clients falling behind is a valid option.

Further. I mentioned already the only bad pill ive seen so far is crc
errors. any plans for those?

Best Jan






On 02.06.2017 11:34, Damian Guy wrote:


I agree with what Matthias has said w.r.t failing fast. There are plenty


of


times when you don't want to fail-fast and must attempt to  make


progress.


The dead-letter queue is exactly for these circumstances. Of course if
every record is failing, then you probably do want to give up.

On Fri, 2 Jun 2017 at 07:56 Matthias J. Sax 


wrote:


First a meta comment. KIP discussion should take place on the dev list

-- if user list is cc'ed please make sure to reply to both lists.


Thanks.
Thanks for making the scope of the KIP clear. Makes a lot of sense to

focus on deserialization exceptions for now.

With regard to corrupted state stores, would it make sense to fail a
task and wipe out the store to repair it via recreation from the
changelog? That's of course a quite advance pattern, but I want to
bring
it up to design the first step in a way such that we can get there (if
we think it's a reasonable idea).

I also want to comment about fail fast vs making progress. I think that
fail-fast must not always be the best option. The scenario I have in
mind is like this: you got a bunch of producers that feed the Streams
input topic. Most producers work find, but maybe one producer miss
behaves and the data it writes is corrupted. You might not even be able
to recover this lost data at any point -- thus, there is no reason to
stop processing but you just skip ove

Re: [VOTE] KIP-113 - Support replicas movement between log directories

2017-06-03 Thread Joel Koshy
+1

Few additional comments (most of which we discussed offline):

   -

   This was summarized in the “discuss” thread, but it is worth recording
   in the KIP itself that the LEO in DescribeDirsResponse is useful to measure
   progress of the move.
   -

   num.replica.move.threads defaults to # log directories - perhaps note
   that we typically expect a 1-1 mapping to disks for this to work well.
   -

   Can you clarify in the KIP whether intra.broker.throttled.rate is
   per-broker or per-thread?
   -

   Reassignment JSON: can log_dirs be made optional? i.e., its absence
   would mean “any”
   -

   Can you also explicitly state somewhere that “any” translates to
   round-robin assignment today?


On Mon, Apr 3, 2017 at 9:49 AM, Dong Lin  wrote:

> Hi all,
>
> It seems that there is no further concern with the KIP-113. We would like
> to start the voting process. The KIP can be found at
> *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 113%3A+Support+replicas+movement+between+log+directories
>  113%3A+Support+replicas+movement+between+log+directories>.*
>
> Thanks,
> Dong
>


Jenkins build is back to normal : kafka-trunk-jdk8 #1650

2017-06-03 Thread Apache Jenkins Server
See 




[GitHub] kafka pull request #3229: MINOR: (docs) add missing word

2017-06-03 Thread mihbor
GitHub user mihbor opened a pull request:

https://github.com/apache/kafka/pull/3229

MINOR: (docs) add missing word



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mihbor/kafka patch-9

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3229.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3229


commit 53926c242d6dbbcc100689a3ac687d46f0fdc467
Author: mihbor 
Date:   2017-06-03T09:48:15Z

MINOR: (docs) add missing word




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk8 #1649

2017-06-03 Thread Apache Jenkins Server
See 


Changes:

[ismael] KAFKA-5374; Set allow auto topic creation to false when requesting node

[ismael] KAFKA-5019; Upgrades notes for idempotent/transactional features and 
new

--
[...truncated 4.15 MB...]
kafka.utils.UtilsTest > testInLock PASSED

kafka.utils.UtilsTest > testSwallow STARTED

kafka.utils.UtilsTest > testSwallow PASSED

kafka.producer.AsyncProducerTest > testFailedSendRetryLogic STARTED

kafka.producer.AsyncProducerTest > testFailedSendRetryLogic PASSED

kafka.producer.AsyncProducerTest > testQueueTimeExpired STARTED

kafka.producer.AsyncProducerTest > testQueueTimeExpired PASSED

kafka.producer.AsyncProducerTest > testPartitionAndCollateEvents STARTED

kafka.producer.AsyncProducerTest > testPartitionAndCollateEvents PASSED

kafka.producer.AsyncProducerTest > testBatchSize STARTED

kafka.producer.AsyncProducerTest > testBatchSize PASSED

kafka.producer.AsyncProducerTest > testSerializeEvents STARTED

kafka.producer.AsyncProducerTest > testSerializeEvents PASSED

kafka.producer.AsyncProducerTest > testProducerQueueSize STARTED

kafka.producer.AsyncProducerTest > testProducerQueueSize PASSED

kafka.producer.AsyncProducerTest > testRandomPartitioner STARTED

kafka.producer.AsyncProducerTest > testRandomPartitioner PASSED

kafka.producer.AsyncProducerTest > testInvalidConfiguration STARTED

kafka.producer.AsyncProducerTest > testInvalidConfiguration PASSED

kafka.producer.AsyncProducerTest > testInvalidPartition STARTED

kafka.producer.AsyncProducerTest > testInvalidPartition PASSED

kafka.producer.AsyncProducerTest > testNoBroker STARTED

kafka.producer.AsyncProducerTest > testNoBroker PASSED

kafka.producer.AsyncProducerTest > testProduceAfterClosed STARTED

kafka.producer.AsyncProducerTest > testProduceAfterClosed PASSED

kafka.producer.AsyncProducerTest > testJavaProducer STARTED

kafka.producer.AsyncProducerTest > testJavaProducer PASSED

kafka.producer.AsyncProducerTest > testIncompatibleEncoder STARTED

kafka.producer.AsyncProducerTest > testIncompatibleEncoder PASSED

kafka.producer.SyncProducerTest > testReachableServer STARTED

kafka.producer.SyncProducerTest > testReachableServer PASSED

kafka.producer.SyncProducerTest > testMessageSizeTooLarge STARTED

kafka.producer.SyncProducerTest > testMessageSizeTooLarge PASSED

kafka.producer.SyncProducerTest > testNotEnoughReplicas STARTED

kafka.producer.SyncProducerTest > testNotEnoughReplicas PASSED

kafka.producer.SyncProducerTest > testMessageSizeTooLargeWithAckZero STARTED

kafka.producer.SyncProducerTest > testMessageSizeTooLargeWithAckZero PASSED

kafka.producer.SyncProducerTest > testProducerCanTimeout STARTED

kafka.producer.SyncProducerTest > testProducerCanTimeout PASSED

kafka.producer.SyncProducerTest > testProduceRequestWithNoResponse STARTED

kafka.producer.SyncProducerTest > testProduceRequestWithNoResponse PASSED

kafka.producer.SyncProducerTest > testEmptyProduceRequest STARTED

kafka.producer.SyncProducerTest > testEmptyProduceRequest PASSED

kafka.producer.SyncProducerTest > testProduceCorrectlyReceivesResponse STARTED

kafka.producer.SyncProducerTest > testProduceCorrectlyReceivesResponse PASSED

kafka.producer.ProducerTest > testSendToNewTopic STARTED

kafka.producer.ProducerTest > testSendToNewTopic PASSED

kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout STARTED

kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout PASSED

kafka.producer.ProducerTest > testSendNullMessage STARTED

kafka.producer.ProducerTest > testSendNullMessage PASSED

kafka.producer.ProducerTest > testUpdateBrokerPartitionInfo STARTED

kafka.producer.ProducerTest > testUpdateBrokerPartitionInfo PASSED

kafka.producer.ProducerTest > testSendWithDeadBroker STARTED

kafka.producer.ProducerTest > testSendWithDeadBroker PASSED

unit.kafka.server.KafkaApisTest > 
shouldRespondWithUnsupportedForMessageFormatOnHandleWriteTxnMarkersWhenMagicLowerThanRequired
 STARTED

unit.kafka.server.KafkaApisTest > 
shouldRespondWithUnsupportedForMessageFormatOnHandleWriteTxnMarkersWhenMagicLowerThanRequired
 PASSED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleTxnOffsetCommitRequestWhenInterBrokerProtocolNotSupported
 STARTED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleTxnOffsetCommitRequestWhenInterBrokerProtocolNotSupported
 PASSED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleAddPartitionsToTxnRequestWhenInterBrokerProtocolNotSupported
 STARTED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleAddPartitionsToTxnRequestWhenInterBrokerProtocolNotSupported
 PASSED

unit.kafka.server.KafkaApisTest > 
shouldAppendToLogOnWriteTxnMarkersWhenCorrectMagicVersion STARTED

unit.kafka.server.KafkaApisTest > 
shouldAppendToLogOnWriteTxnMarkersWhenCorrectMagicVersion PASSED


[jira] [Commented] (KAFKA-5098) KafkaProducer.send() blocks and generates TimeoutException if topic name has illegal char

2017-06-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035910#comment-16035910
 ] 

ASF GitHub Bot commented on KAFKA-5098:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3223


> KafkaProducer.send() blocks and generates TimeoutException if topic name has 
> illegal char
> -
>
> Key: KAFKA-5098
> URL: https://issues.apache.org/jira/browse/KAFKA-5098
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.2.0
> Environment: Java client running against server using 
> wurstmeister/kafka Docker image.
>Reporter: Jeff Larsen
>Assignee: huxihx
> Fix For: 0.11.0.0
>
>
> The server is running with auto create enabled. If we try to publish to a 
> topic with a forward slash in the name, the call blocks and we get a 
> TimeoutException in the Callback. I would expect it to return immediately 
> with an InvalidTopicException.
> There are other blocking issues that have been reported which may be related 
> to some degree, but this particular cause seems unrelated.
> Sample code:
> {code}
> import org.apache.kafka.clients.producer.*;
> import java.util.*;
> public class KafkaProducerUnexpectedBlockingAndTimeoutException {
>   public static void main(String[] args) {
> Properties props = new Properties();
> props.put("bootstrap.servers", "kafka.example.com:9092");
> props.put("key.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer");
> props.put("value.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer");
> props.put("max.block.ms", 1); // 10 seconds should illustrate our 
> point
> String separator = "/";
> //String separator = "_";
> try (Producer producer = new KafkaProducer<>(props)) {
>   System.out.println("Calling KafkaProducer.send() at " + new Date());
>   producer.send(
>   new ProducerRecord("abc" + separator + 
> "someStreamName",
>   "Not expecting a TimeoutException here"),
>   new Callback() {
> @Override
> public void onCompletion(RecordMetadata metadata, Exception e) {
>   if (e != null) {
> System.out.println(e.toString());
>   }
> }
>   });
>   System.out.println("KafkaProducer.send() completed at " + new Date());
> }
>   }
> }
> {code}
> Switching to the underscore separator in the above example works as expected.
> Mea culpa: We neglected to research allowed chars in a topic name, but the 
> TimeoutException we encountered did not help point us in the right direction.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-5098) KafkaProducer.send() blocks and generates TimeoutException if topic name has illegal char

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-5098.

   Resolution: Fixed
Fix Version/s: 0.11.0.0

Issue resolved by pull request 3223
[https://github.com/apache/kafka/pull/3223]

> KafkaProducer.send() blocks and generates TimeoutException if topic name has 
> illegal char
> -
>
> Key: KAFKA-5098
> URL: https://issues.apache.org/jira/browse/KAFKA-5098
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.2.0
> Environment: Java client running against server using 
> wurstmeister/kafka Docker image.
>Reporter: Jeff Larsen
>Assignee: huxihx
> Fix For: 0.11.0.0
>
>
> The server is running with auto create enabled. If we try to publish to a 
> topic with a forward slash in the name, the call blocks and we get a 
> TimeoutException in the Callback. I would expect it to return immediately 
> with an InvalidTopicException.
> There are other blocking issues that have been reported which may be related 
> to some degree, but this particular cause seems unrelated.
> Sample code:
> {code}
> import org.apache.kafka.clients.producer.*;
> import java.util.*;
> public class KafkaProducerUnexpectedBlockingAndTimeoutException {
>   public static void main(String[] args) {
> Properties props = new Properties();
> props.put("bootstrap.servers", "kafka.example.com:9092");
> props.put("key.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer");
> props.put("value.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer");
> props.put("max.block.ms", 1); // 10 seconds should illustrate our 
> point
> String separator = "/";
> //String separator = "_";
> try (Producer producer = new KafkaProducer<>(props)) {
>   System.out.println("Calling KafkaProducer.send() at " + new Date());
>   producer.send(
>   new ProducerRecord("abc" + separator + 
> "someStreamName",
>   "Not expecting a TimeoutException here"),
>   new Callback() {
> @Override
> public void onCompletion(RecordMetadata metadata, Exception e) {
>   if (e != null) {
> System.out.println(e.toString());
>   }
> }
>   });
>   System.out.println("KafkaProducer.send() completed at " + new Date());
> }
>   }
> }
> {code}
> Switching to the underscore separator in the above example works as expected.
> Mea culpa: We neglected to research allowed chars in a topic name, but the 
> TimeoutException we encountered did not help point us in the right direction.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3223: KAFKA-5098: KafkaProducer.send() should validate t...

2017-06-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3223


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4291) TopicCommand --describe shows topics marked for deletion as under-replicated and unavailable (KIP-137)

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4291:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> TopicCommand --describe shows topics marked for deletion as under-replicated 
> and unavailable (KIP-137)
> --
>
> Key: KAFKA-4291
> URL: https://issues.apache.org/jira/browse/KAFKA-4291
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Affects Versions: 0.10.0.1
>Reporter: Mickael Maison
>Assignee: Mickael Maison
>  Labels: kip
> Fix For: 0.11.0.0
>
>
> When using kafka-topics.sh --describe with --under-replicated-partitions or 
> --unavailable-partitions, topics marked for deletion are listed.
> While this is debatable whether we want to list such topics this way, it 
> should at least print that the topic is marked for deletion, like --list 
> does. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4913) creating a window store with one segment throws division by zero error

2017-06-03 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035902#comment-16035902
 ] 

Ismael Juma commented on KAFKA-4913:


Is this still planned for 0.11.0.0?

> creating a window store with one segment throws division by zero error
> --
>
> Key: KAFKA-4913
> URL: https://issues.apache.org/jira/browse/KAFKA-4913
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Xavier Léauté
>Assignee: Damian Guy
> Fix For: 0.11.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4291) TopicCommand --describe shows topics marked for deletion as under-replicated and unavailable (KIP-137)

2017-06-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035901#comment-16035901
 ] 

ASF GitHub Bot commented on KAFKA-4291:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2011


> TopicCommand --describe shows topics marked for deletion as under-replicated 
> and unavailable (KIP-137)
> --
>
> Key: KAFKA-4291
> URL: https://issues.apache.org/jira/browse/KAFKA-4291
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Affects Versions: 0.10.0.1
>Reporter: Mickael Maison
>Assignee: Mickael Maison
>  Labels: kip
> Fix For: 0.11.0.0
>
>
> When using kafka-topics.sh --describe with --under-replicated-partitions or 
> --unavailable-partitions, topics marked for deletion are listed.
> While this is debatable whether we want to list such topics this way, it 
> should at least print that the topic is marked for deletion, like --list 
> does. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2011: KAFKA-4291: TopicCommand --describe shows topics m...

2017-06-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2011


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3228: MINOR: (docs) in section, not at section

2017-06-03 Thread mihbor
GitHub user mihbor opened a pull request:

https://github.com/apache/kafka/pull/3228

MINOR: (docs) in section, not at section

See 
https://english.stackexchange.com/questions/158981/at-this-section-vs-in-this-section
 and 
https://forum.wordreference.com/threads/at-in-within-the-section-we-can-find.374188/

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mihbor/kafka patch-8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3228


commit 0d5cee3064eb5a1046e852df6086c8ffb45d6653
Author: mihbor 
Date:   2017-06-03T09:18:42Z

MINOR: in section, not at section

See 
https://english.stackexchange.com/questions/158981/at-this-section-vs-in-this-section
 and 
https://forum.wordreference.com/threads/at-in-within-the-section-we-can-find.374188/




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-147: Add missing type parameters to StateStoreSupplier factories and KGroupedStream/Table methods

2017-06-03 Thread Michal Borowiecki

I agree maintaining backwards-compatibility here adds a lot of overhead.

I haven't so far found a way to reconcile these elegantly.

Whichever way we go it's better to take the pain sooner rather than 
later. Kafka 0.11.0.0 (through KAFKA-5045 
/KIP-114) increased 
the surface affected by the lack of fully type-parametrised suppliers 
noticeably.


Cheers,

Michał


On 03/06/17 09:43, Damian Guy wrote:
Hmm, i guess this won't work due to adding the additional  to the 
StateStoreSupplier params on reduce, count, aggregate etc.


On Sat, 3 Jun 2017 at 09:06 Damian Guy > wrote:


Hi Michal,

Thanks for the KIP - is there a way we can do this without having
to introduce the new Typed.. Interfaces, overloaded methods etc?
Is it possible that we just need to provide a couple of new
methods on PersistentKeyValueFactory for windowed and
sessionWindowed to return interfaces like you've introduced in
TypedStores?
I admit i haven't looked in much detail if that would work.

My concern is that this is duplicating a bunch of code and
increasing the surface area for what is minimal benefit. It is one
of those cases where i'd love to not have to maintain backward
compatibility.

Thanks,
Damian

On Fri, 2 Jun 2017 at 08:20 Michal Borowiecki
mailto:michal.borowie...@openbet.com>> wrote:

Thanks Matthias,

I appreciate people are busy now preparing the 0.11 release.

One thing I would also appreciate input on is perhaps a better
name for the new TypedStores class, I just picked it quickly
but don't really like it.

Perhaps StateStores would make for a better name?

Cheers,
Michal


On 02/06/17 07:18, Matthias J. Sax wrote:

Thanks for the update Michal.

I did skip over the PR. Looks good to me, as far as I can tell. Maybe
Damian, Xavier, or Ismael can comment on this. Would be good to get
confirmation that the change is backward compatible.


-Matthias


On 5/27/17 11:11 AM, Michal Borowiecki wrote:

Hi all,

I've updated the KIP to reflect the proposed backwards-compatible 
approach:


https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=69408481


Given the vast area of APIs affected, I think the PR is easier to read
than the code excerpts in the KIP itself:
https://github.com/apache/kafka/pull/2992/files

Thanks,
Michał

On 07/05/17 10:16, Eno Thereska wrote:

I like this KIP in general and I agree it’s needed. Perhaps Damian can 
comment on the session store issue?

Thanks
Eno

On May 6, 2017, at 10:32 PM, Michal 
Borowiecki
  wrote:

Hi Matthias,

Agreed. I tried your proposal and indeed it would work.

However, I think to maintain full backward compatibility we would also 
need to deprecate Stores.create() and leave it unchanged, while providing a new 
method that returns the more strongly typed Factories.

( This is because PersistentWindowFactory and PersistentSessionFactory cannot extend the existing 
PersistentKeyValueFactory interface, since their build() methods will be returning 
TypedStateStoreSupplier> and TypedStateStoreSupplier> 
respectively, which are NOT subclasses of TypedStateStoreSupplier>. I do not see 
another way around it. Admittedly, my type covariance skills are rudimentary. Does anyone see a better way around 
this? )

Since create() takes only the store name as argument, and I don't see 
what we could overload it with, the new method would need to have a different 
name.

Alternatively, since create(String) is the only method in Stores, we 
could deprecate the entire class and provide a new one. That would be my 
preference. Any ideas what to call it?



All comments and suggestions appreciated.



Cheers,

Michał


On 04/05/17 21:48, Matthias J. Sax wrote:

I had a quick look into this.

With regard to backward compatibility, I think it would be required do
introduce a new type `TypesStateStoreSupplier` (that extends
`StateStoreSupplier`) and to overload all methods that take a
`StateStoreSupplier` that accept the new type instead of the current 
one.

This would allow `.build` to return a `TypedStateStoreSupplier` and
thus, would not break any code. As least if I did not miss anything with
regard to some magic of type inference using generics (I am not an
expert in this field).


-Matthias

On 5/4/17 11:32 AM, Matthias J. Sax wrote:

Did not have time to have a look. But backward compatibility is a must
from my point of view.

-Matthias


On 5/4/17 12:56 AM, Michal Borowiecki

Re: Sink Processor definition

2017-06-03 Thread Michal Borowiecki
Yes, I think the key distinction, from the point of view of that 
documentation section, is that it doesn't have downstream processors.



On 03/06/17 09:48, Damian Guy wrote:

Hi Michal,

In this case Sink Processor is really referring to a SinkNode that can 
only produce to a kafka topic. Maybe the terminology is incorrect as 
strictly speaking a processor that writes data to anything could be 
considered a Sink Processor.


On Sat, 3 Jun 2017 at 09:23 Michal Borowiecki 
mailto:michal.borowie...@openbet.com>> 
wrote:


Hi all,

Streams docs say:


  * *Sink Processor*: A sink processor is a special type of
stream processor that does not have down-stream processors.
It sends any received records from its up-stream processors
to a specified Kafka topic.


Would a processor that doesn't produce to a kafka topic (directly)
but only updates a state store also be considered a sink
processor? I think yes.

I'll submit a PR to that effect unless I hear otherwise.

Cheers,

Michał

-- 
 	Michal Borowiecki

Senior Software Engineer L4
T:  +44 208 742 1600 


+44 203 249 8448 



E:  michal.borowie...@openbet.com

W:  www.openbet.com 


OpenBet Ltd

Chiswick Park Building 9

566 Chiswick High Rd

London

W4 5XT

UK




This message is confidential and intended only for the addressee.
If you have received this message in error, please immediately
notify the postmas...@openbet.com 
and delete it from your system as well as any copies. The content
of e-mails as well as traffic data may be monitored by OpenBet for
employment and security purposes. To protect the environment
please do not print this e-mail unless necessary. OpenBet Ltd.
Registered Office: Chiswick Park Building 9, 566 Chiswick High
Road, London, W4 5XT, United Kingdom. A company registered in
England and Wales. Registered no. 3134634. VAT no. GB927523612



--
Signature
 Michal Borowiecki
Senior Software Engineer L4
T:  +44 208 742 1600


+44 203 249 8448



E:  michal.borowie...@openbet.com
W:  www.openbet.com 


OpenBet Ltd

Chiswick Park Building 9

566 Chiswick High Rd

London

W4 5XT

UK




This message is confidential and intended only for the addressee. If you 
have received this message in error, please immediately notify the 
postmas...@openbet.com  and delete it 
from your system as well as any copies. The content of e-mails as well 
as traffic data may be monitored by OpenBet for employment and security 
purposes. To protect the environment please do not print this e-mail 
unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 
9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company 
registered in England and Wales. Registered no. 3134634. VAT no. 
GB927523612




Re: Sink Processor definition

2017-06-03 Thread Damian Guy
Hi Michal,

In this case Sink Processor is really referring to a SinkNode that can only
produce to a kafka topic. Maybe the terminology is incorrect as strictly
speaking a processor that writes data to anything could be considered a
Sink Processor.

On Sat, 3 Jun 2017 at 09:23 Michal Borowiecki 
wrote:

> Hi all,
>
> Streams docs say:
>
>
>- *Sink Processor*: A sink processor is a special type of stream
>processor that does not have down-stream processors. It sends any received
>records from its up-stream processors to a specified Kafka topic.
>
> Would a processor that doesn't produce to a kafka topic (directly) but
> only updates a state store also be considered a sink processor? I think yes.
>
> I'll submit a PR to that effect unless I hear otherwise.
>
> Cheers,
>
> Michał
> --
>  Michal Borowiecki
> Senior Software Engineer L4
> T: +44 208 742 1600 <+44%2020%208742%201600>
>
>
> +44 203 249 8448 <+44%2020%203249%208448>
>
>
>
> E: michal.borowie...@openbet.com
> W: www.openbet.com
> OpenBet Ltd
>
> Chiswick Park Building 9
>
> 566 Chiswick High Rd
>
> London
>
> W4 5XT
>
> UK
> 
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmas...@openbet.com and delete it from your system as well as any
> copies. The content of e-mails as well as traffic data may be monitored by
> OpenBet for employment and security purposes. To protect the environment
> please do not print this e-mail unless necessary. OpenBet Ltd. Registered
> Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT,
> United Kingdom. A company registered in England and Wales. Registered no.
> 3134634. VAT no. GB927523612
>


[GitHub] kafka pull request #3227: MINOR: fix quotes for consistent rendering

2017-06-03 Thread mihbor
GitHub user mihbor opened a pull request:

https://github.com/apache/kafka/pull/3227

MINOR: fix quotes for consistent rendering

“ as opposed to " don't render consistently across browsers. On current 
Kafka website they render correctly in Firefox but not Chrome 
(“runs”) - charset issue?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mihbor/kafka patch-7

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3227.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3227


commit 17d884c388fb4c7fc330fc96e2a59f70467f744c
Author: mihbor 
Date:   2017-06-03T08:45:46Z

MINOR: fix quotes for consistent rendering

“ as opposed to " don't render consistently across browsers. On current 
Kafka website they render correctly in Firefox but not Chrome 
(“runs”) - charset issue?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-147: Add missing type parameters to StateStoreSupplier factories and KGroupedStream/Table methods

2017-06-03 Thread Damian Guy
Hmm, i guess this won't work due to adding the additional  to the
StateStoreSupplier params on reduce, count, aggregate etc.

On Sat, 3 Jun 2017 at 09:06 Damian Guy  wrote:

> Hi Michal,
>
> Thanks for the KIP - is there a way we can do this without having to
> introduce the new Typed.. Interfaces, overloaded methods etc? Is it
> possible that we just need to provide a couple of new methods on
> PersistentKeyValueFactory for windowed and sessionWindowed to return
> interfaces like you've introduced in TypedStores?
> I admit i haven't looked in much detail if that would work.
>
> My concern is that this is duplicating a bunch of code and increasing the
> surface area for what is minimal benefit. It is one of those cases where
> i'd love to not have to maintain backward compatibility.
>
> Thanks,
> Damian
>
> On Fri, 2 Jun 2017 at 08:20 Michal Borowiecki <
> michal.borowie...@openbet.com> wrote:
>
>> Thanks Matthias,
>>
>> I appreciate people are busy now preparing the 0.11 release.
>>
>> One thing I would also appreciate input on is perhaps a better name for
>> the new TypedStores class, I just picked it quickly but don't really like
>> it.
>>
>> Perhaps StateStores would make for a better name?
>> Cheers,
>> Michal
>>
>>
>> On 02/06/17 07:18, Matthias J. Sax wrote:
>>
>> Thanks for the update Michal.
>>
>> I did skip over the PR. Looks good to me, as far as I can tell. Maybe
>> Damian, Xavier, or Ismael can comment on this. Would be good to get
>> confirmation that the change is backward compatible.
>>
>>
>> -Matthias
>>
>>
>> On 5/27/17 11:11 AM, Michal Borowiecki wrote:
>>
>> Hi all,
>>
>> I've updated the KIP to reflect the proposed backwards-compatible approach:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=69408481
>>
>>
>> Given the vast area of APIs affected, I think the PR is easier to read
>> than the code excerpts in the KIP 
>> itself:https://github.com/apache/kafka/pull/2992/files
>>
>> Thanks,
>> Michał
>>
>> On 07/05/17 10:16, Eno Thereska wrote:
>>
>> I like this KIP in general and I agree it’s needed. Perhaps Damian can 
>> comment on the session store issue?
>>
>> Thanks
>> Eno
>>
>> On May 6, 2017, at 10:32 PM, Michal Borowiecki 
>>   wrote:
>>
>> Hi Matthias,
>>
>> Agreed. I tried your proposal and indeed it would work.
>>
>> However, I think to maintain full backward compatibility we would also need 
>> to deprecate Stores.create() and leave it unchanged, while providing a new 
>> method that returns the more strongly typed Factories.
>>
>> ( This is because PersistentWindowFactory and PersistentSessionFactory 
>> cannot extend the existing PersistentKeyValueFactory interface, since their 
>> build() methods will be returning TypedStateStoreSupplier> 
>> and TypedStateStoreSupplier> respectively, which are NOT 
>> subclasses of TypedStateStoreSupplier>. I do not see 
>> another way around it. Admittedly, my type covariance skills are 
>> rudimentary. Does anyone see a better way around this? )
>>
>> Since create() takes only the store name as argument, and I don't see what 
>> we could overload it with, the new method would need to have a different 
>> name.
>>
>> Alternatively, since create(String) is the only method in Stores, we could 
>> deprecate the entire class and provide a new one. That would be my 
>> preference. Any ideas what to call it?
>>
>>
>>
>> All comments and suggestions appreciated.
>>
>>
>>
>> Cheers,
>>
>> Michał
>>
>>
>> On 04/05/17 21:48, Matthias J. Sax wrote:
>>
>> I had a quick look into this.
>>
>> With regard to backward compatibility, I think it would be required do
>> introduce a new type `TypesStateStoreSupplier` (that extends
>> `StateStoreSupplier`) and to overload all methods that take a
>> `StateStoreSupplier` that accept the new type instead of the current one.
>>
>> This would allow `.build` to return a `TypedStateStoreSupplier` and
>> thus, would not break any code. As least if I did not miss anything with
>> regard to some magic of type inference using generics (I am not an
>> expert in this field).
>>
>>
>> -Matthias
>>
>> On 5/4/17 11:32 AM, Matthias J. Sax wrote:
>>
>> Did not have time to have a look. But backward compatibility is a must
>> from my point of view.
>>
>> -Matthias
>>
>>
>> On 5/4/17 12:56 AM, Michal Borowiecki wrote:
>>
>> Hello,
>>
>> I've updated the KIP with missing information.
>>
>> I would especially appreciate some comments on the compatibility aspects
>> of this as the proposed change is not fully backwards-compatible.
>>
>> In the absence of comments I shall call for a vote in the next few days.
>>
>> Thanks,
>>
>> Michal
>>
>>
>> On 30/04/17 23:11, Michal Borowiecki wrote:
>>
>> Hi community!
>>
>> I have just drafted KIP-147: Add missing type parameters to
>> StateStoreSupplier factories and KGroupedStream/Table 
>> methods
>>   
>> 

[jira] [Updated] (KAFKA-5371) SyncProducerTest.testReachableServer has become flaky

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5371:
---
Issue Type: Test  (was: Bug)

> SyncProducerTest.testReachableServer has become flaky
> -
>
> Key: KAFKA-5371
> URL: https://issues.apache.org/jira/browse/KAFKA-5371
> Project: Kafka
>  Issue Type: Test
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Ismael Juma
> Fix For: 0.11.0.0
>
>
> This test has started failing recently on jenkins with the following 
> {noformat}
> org.scalatest.junit.JUnitTestFailedError: Unexpected failure sending message 
> to broker. null
>   at 
> org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:100)
>   at 
> org.scalatest.junit.JUnitSuite.newAssertionFailedException(JUnitSuite.scala:71)
>   at org.scalatest.Assertions$class.fail(Assertions.scala:1089)
>   at org.scalatest.junit.JUnitSuite.fail(JUnitSuite.scala:71)
>   at 
> kafka.producer.SyncProducerTest.testReachableServer(SyncProducerTest.scala:71)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:147)
>  

[jira] [Updated] (KAFKA-5371) SyncProducerTest.testReachableServer has become flaky

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5371:
---
 Assignee: Ismael Juma  (was: Apurva Mehta)
Fix Version/s: 0.11.0.0
   Status: Patch Available  (was: Open)

> SyncProducerTest.testReachableServer has become flaky
> -
>
> Key: KAFKA-5371
> URL: https://issues.apache.org/jira/browse/KAFKA-5371
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Ismael Juma
> Fix For: 0.11.0.0
>
>
> This test has started failing recently on jenkins with the following 
> {noformat}
> org.scalatest.junit.JUnitTestFailedError: Unexpected failure sending message 
> to broker. null
>   at 
> org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:100)
>   at 
> org.scalatest.junit.JUnitSuite.newAssertionFailedException(JUnitSuite.scala:71)
>   at org.scalatest.Assertions$class.fail(Assertions.scala:1089)
>   at org.scalatest.junit.JUnitSuite.fail(JUnitSuite.scala:71)
>   at 
> kafka.producer.SyncProducerTest.testReachableServer(SyncProducerTest.scala:71)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.remote.internal.hub.MessageHubBack

Sink Processor definition

2017-06-03 Thread Michal Borowiecki

Hi all,

Streams docs say:


  * *Sink Processor*: A sink processor is a special type of stream
processor that does not have down-stream processors. It sends any
received records from its up-stream processors to a specified
Kafka topic.

Would a processor that doesn't produce to a kafka topic (directly) but 
only updates a state store also be considered a sink processor? I think yes.


I'll submit a PR to that effect unless I hear otherwise.

Cheers,

Michał

--
Signature
 Michal Borowiecki
Senior Software Engineer L4
T:  +44 208 742 1600


+44 203 249 8448



E:  michal.borowie...@openbet.com
W:  www.openbet.com 


OpenBet Ltd

Chiswick Park Building 9

566 Chiswick High Rd

London

W4 5XT

UK




This message is confidential and intended only for the addressee. If you 
have received this message in error, please immediately notify the 
postmas...@openbet.com  and delete it 
from your system as well as any copies. The content of e-mails as well 
as traffic data may be monitored by OpenBet for employment and security 
purposes. To protect the environment please do not print this e-mail 
unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 
9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company 
registered in England and Wales. Registered no. 3134634. VAT no. 
GB927523612




[GitHub] kafka pull request #3226: MINOR: Fix grammar

2017-06-03 Thread mihbor
GitHub user mihbor opened a pull request:

https://github.com/apache/kafka/pull/3226

MINOR: Fix grammar



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mihbor/kafka patch-6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3226.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3226


commit 82ea1be4721f53535bef1b2acbb580ae7625970b
Author: mihbor 
Date:   2017-06-03T08:18:36Z

MINOR: Fix grammar




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3225: KAFKA-5371: Increase request timeout for producer ...

2017-06-03 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/3225

KAFKA-5371: Increase request timeout for producer used by 
testReachableServer

500ms is low for a shared Jenkins environment.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-5371-flaky-testReachableServer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3225.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3225


commit 5f9b0df4c296d80d64091da3b4b0106816643114
Author: Ismael Juma 
Date:   2017-06-03T08:18:21Z

KAFKA-5371: Increase request timeout for producer used by 
testReachableServer

500ms is low for a shared Jenkins environment.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-5371) SyncProducerTest.testReachableServer has become flaky

2017-06-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035891#comment-16035891
 ] 

ASF GitHub Bot commented on KAFKA-5371:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/3225

KAFKA-5371: Increase request timeout for producer used by 
testReachableServer

500ms is low for a shared Jenkins environment.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-5371-flaky-testReachableServer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3225.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3225


commit 5f9b0df4c296d80d64091da3b4b0106816643114
Author: Ismael Juma 
Date:   2017-06-03T08:18:21Z

KAFKA-5371: Increase request timeout for producer used by 
testReachableServer

500ms is low for a shared Jenkins environment.




> SyncProducerTest.testReachableServer has become flaky
> -
>
> Key: KAFKA-5371
> URL: https://issues.apache.org/jira/browse/KAFKA-5371
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>
> This test has started failing recently on jenkins with the following 
> {noformat}
> org.scalatest.junit.JUnitTestFailedError: Unexpected failure sending message 
> to broker. null
>   at 
> org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:100)
>   at 
> org.scalatest.junit.JUnitSuite.newAssertionFailedException(JUnitSuite.scala:71)
>   at org.scalatest.Assertions$class.fail(Assertions.scala:1089)
>   at org.scalatest.junit.JUnitSuite.fail(JUnitSuite.scala:71)
>   at 
> kafka.producer.SyncProducerTest.testReachableServer(SyncProducerTest.scala:71)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHan

[GitHub] kafka pull request #3224: MINOR: mention IQ use-case in the Overview

2017-06-03 Thread mihbor
GitHub user mihbor opened a pull request:

https://github.com/apache/kafka/pull/3224

MINOR: mention IQ use-case in the Overview



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mihbor/kafka patch-5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3224.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3224


commit f4fc048d2df3be564c493da55607eec9ffce576b
Author: mihbor 
Date:   2017-06-03T08:12:01Z

MINOR: mention IQ use-case in the Overview




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-117: Add a public AdminClient API for Kafka admin operations

2017-06-03 Thread Ismael Juma
Hi Colin,

Thanks for the feedback. Regarding the behaviour for brokers older than
0.11.0, I had gone for the Javadoc note because it made it possible to
avoid the inefficiency of getting all topics for users who have disabled
auto topic creation.

After some thought and discussion, I agree that keeping the behaviour
consistent across broker versions is the better option, so the PR was
updated to do that.

Ismael

On Mon, May 22, 2017 at 7:42 PM, Colin McCabe  wrote:

> > As you noted, though, we don't have a way to do this for the 0.10.x
> > releases.  It seems a bit harsh to have such different behavior there.
> > Is there a way that we can fudge this a bit so that it mostly works?
> > For example, when communicating with 0.10.x brokers, describeTopics
> > could do a MetadataRequest(topics=*) to filter out non-existent topics.
> >
> > This would obviously have a bit of a time-of-check, time-of-use race
> > condition since we're making two calls.  And also a scalability problem
> > since we're using topics=*.  Is it worth it to make the behavior saner
> > on older brokers?  Or should we add a JavaDoc note and move on?
> >
> > best,
> > Colin
> >
> >
> > On Fri, May 19, 2017, at 05:46, Ismael Juma wrote:
> > > Hi all,
> > >
> > > Feedback from people who tried the AdminClient is that auto topic
> > > creation
> > > during describe is unexpected and confusing. This is consistent with
> the
> > > reaction of most people when they learn that MetadataRequest can cause
> > > topics to be created. We had assumed that we'd tackle this issue for
> all
> > > the clients as part of deprecation of server-side auto topic creation
> in
> > > favour of client-side auto-topic creation.
> > >
> > > However, it would be better to do the right thing for the AdminClient
> > > from
> > > the start. Users will be less confused and we won't have to deal with
> > > compatibility concerns. Jason suggested a simple solution: make it
> > > possible
> > > to disallow auto topic creation when sending the metadata request. The
> > > AdminClient would take advantage of this now (i.e. 0.11.0.0) while the
> > > producer and consumer would retain the existing behaviour. In a
> > > subsequent
> > > release, we'll work out the details of how to move away from
> server-side
> > > auto topic creation for the producer and consumer (taking into account
> > > the
> > > compatibility impact).
> > >
> > > Because of the protocol change, this solution would only help in cases
> > > where the AdminClient is describing topics from a 0.11.0.0 or newer
> > > broker.
> > >
> > > I submitted a PR for this and it's small and straightforward:
> > >
> > > https://github.com/apache/kafka/pull/3098
> > >
> > > Thoughts?
> > >
> > > Ismael
> > >
> > > On Sat, Mar 25, 2017 at 1:25 AM, Colin McCabe 
> wrote:
> > >
> > > > With binding +1 votes from Gwen Shapira, Sriram Subramanian, and
> Grant
> > > > Henke, and a non-binding vote from Dong Lin, the vote passes.  There
> > > > were no +0 or -1 votes.  As mentioned earlier, the interface will be
> > > > unstable at first and we will continue to evolve it.
> > > >
> > > > thanks,
> > > > Colin McCabe
> > > >
> > > >
> > > > On Wed, Mar 22, 2017, at 10:21, Colin McCabe wrote:
> > > > > On Fri, Mar 17, 2017, at 10:50, Jun Rao wrote:
> > > > > > Hi, Colin,
> > > > > >
> > > > > > Thanks for the KIP. Looks good overall. A few comments below.
> > > > > >
> > > > > > 1. Sometimes we return
> > > > > > CompletableFuture>
> > > > > > and some other times we return
> > > > > > Map>
> > > > > > , which doesn't seem consistent. Is that intentional?
> > > > >
> > > > > Yes, this is intentional.  We got feedback from some people that
> they
> > > > > wanted a single future that would fail if anything failed.  Other
> people
> > > > > wanted to be able to detect failures on individual elements of a
> batch.
> > > > > This API lets us have both (you just choose which future you want
> to
> > > > > wait on).
> > > > >
> > > > > >
> > > > > > 2. We support batching in CreateTopic/DeleteTopic/ListTopic,
> but not
> > > > in
> > > > > > DescribeTopic. Should we add batching in DescribeTopic to make it
> > > > > > consistent?
> > > > >
> > > > > Good idea.  Let's add batching to DescribeTopic(s).
> > > > >
> > > > > > Also, both ListTopic and DescribeTopic seem to return
> > > > > > TopicDescription. Could we just consolidate the two by just
> keeping
> > > > > > DescribeTopic?
> > > > >
> > > > > Sorry, that was a typo.  ListTopics is supposed to return
> TopicListing,
> > > > > which tells you only the name of the topic and whether it is
> internal.
> > > > > The idea is that later we will add another RPC which allows us to
> fetch
> > > > > just this information, and not the other topic fields. That way,
> we can
> > > > > be more efficient.  The idea is that ListTopics is like
> readdir()/ls and
> > > > > DescribeTopics is like stat().  Getting detailed information about
> > > > > 1,000s of topics could be quite a resourc

Re: [DISCUSS] KIP-147: Add missing type parameters to StateStoreSupplier factories and KGroupedStream/Table methods

2017-06-03 Thread Damian Guy
Hi Michal,

Thanks for the KIP - is there a way we can do this without having to
introduce the new Typed.. Interfaces, overloaded methods etc? Is it
possible that we just need to provide a couple of new methods on
PersistentKeyValueFactory for windowed and sessionWindowed to return
interfaces like you've introduced in TypedStores?
I admit i haven't looked in much detail if that would work.

My concern is that this is duplicating a bunch of code and increasing the
surface area for what is minimal benefit. It is one of those cases where
i'd love to not have to maintain backward compatibility.

Thanks,
Damian

On Fri, 2 Jun 2017 at 08:20 Michal Borowiecki 
wrote:

> Thanks Matthias,
>
> I appreciate people are busy now preparing the 0.11 release.
>
> One thing I would also appreciate input on is perhaps a better name for
> the new TypedStores class, I just picked it quickly but don't really like
> it.
>
> Perhaps StateStores would make for a better name?
> Cheers,
> Michal
>
>
> On 02/06/17 07:18, Matthias J. Sax wrote:
>
> Thanks for the update Michal.
>
> I did skip over the PR. Looks good to me, as far as I can tell. Maybe
> Damian, Xavier, or Ismael can comment on this. Would be good to get
> confirmation that the change is backward compatible.
>
>
> -Matthias
>
>
> On 5/27/17 11:11 AM, Michal Borowiecki wrote:
>
> Hi all,
>
> I've updated the KIP to reflect the proposed backwards-compatible approach:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=69408481
>
>
> Given the vast area of APIs affected, I think the PR is easier to read
> than the code excerpts in the KIP 
> itself:https://github.com/apache/kafka/pull/2992/files
>
> Thanks,
> Michał
>
> On 07/05/17 10:16, Eno Thereska wrote:
>
> I like this KIP in general and I agree it’s needed. Perhaps Damian can 
> comment on the session store issue?
>
> Thanks
> Eno
>
> On May 6, 2017, at 10:32 PM, Michal Borowiecki 
>   wrote:
>
> Hi Matthias,
>
> Agreed. I tried your proposal and indeed it would work.
>
> However, I think to maintain full backward compatibility we would also need 
> to deprecate Stores.create() and leave it unchanged, while providing a new 
> method that returns the more strongly typed Factories.
>
> ( This is because PersistentWindowFactory and PersistentSessionFactory cannot 
> extend the existing PersistentKeyValueFactory interface, since their build() 
> methods will be returning TypedStateStoreSupplier> and 
> TypedStateStoreSupplier> respectively, which are NOT 
> subclasses of TypedStateStoreSupplier>. I do not see 
> another way around it. Admittedly, my type covariance skills are rudimentary. 
> Does anyone see a better way around this? )
>
> Since create() takes only the store name as argument, and I don't see what we 
> could overload it with, the new method would need to have a different name.
>
> Alternatively, since create(String) is the only method in Stores, we could 
> deprecate the entire class and provide a new one. That would be my 
> preference. Any ideas what to call it?
>
>
>
> All comments and suggestions appreciated.
>
>
>
> Cheers,
>
> Michał
>
>
> On 04/05/17 21:48, Matthias J. Sax wrote:
>
> I had a quick look into this.
>
> With regard to backward compatibility, I think it would be required do
> introduce a new type `TypesStateStoreSupplier` (that extends
> `StateStoreSupplier`) and to overload all methods that take a
> `StateStoreSupplier` that accept the new type instead of the current one.
>
> This would allow `.build` to return a `TypedStateStoreSupplier` and
> thus, would not break any code. As least if I did not miss anything with
> regard to some magic of type inference using generics (I am not an
> expert in this field).
>
>
> -Matthias
>
> On 5/4/17 11:32 AM, Matthias J. Sax wrote:
>
> Did not have time to have a look. But backward compatibility is a must
> from my point of view.
>
> -Matthias
>
>
> On 5/4/17 12:56 AM, Michal Borowiecki wrote:
>
> Hello,
>
> I've updated the KIP with missing information.
>
> I would especially appreciate some comments on the compatibility aspects
> of this as the proposed change is not fully backwards-compatible.
>
> In the absence of comments I shall call for a vote in the next few days.
>
> Thanks,
>
> Michal
>
>
> On 30/04/17 23:11, Michal Borowiecki wrote:
>
> Hi community!
>
> I have just drafted KIP-147: Add missing type parameters to
> StateStoreSupplier factories and KGroupedStream/Table 
> methods
>   
>  
> 
>
> Please let me know if this a step in the right direction.
>
> All comments welcome.
>
> Thanks,
> Michal
> --
> Signature  
>    Mi

[jira] [Updated] (KAFKA-3096) Leader is not set to -1 when it is shutdown if followers are down

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3096:
---
Fix Version/s: 0.11.1.0

> Leader is not set to -1 when it is shutdown if followers are down
> -
>
> Key: KAFKA-3096
> URL: https://issues.apache.org/jira/browse/KAFKA-3096
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>  Labels: reliability
> Fix For: 0.11.1.0
>
>
> Assuming a cluster with 2 brokers with unclear leader election disabled:
> 1. Start brokers 0 and 1
> 2. Perform partition assignment
> 3. Broker 0 is elected leader
> 4. Produce message and wait until metadata is propagated
> 6. Shutdown follower
> 7. Produce message
> 8. Shutdown leader
> 9. Start follower
> 10. Wait for leader election
> Expected: leader is -1
> Actual: leader is 0
> We have a test for this, but a bug in `waitUntilLeaderIsElectedOrChanged` 
> means that `newLeaderOpt` is not being checked.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-3143) inconsistent state in ZK when all replicas are dead

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3143:
---
Fix Version/s: 0.11.1.0

> inconsistent state in ZK when all replicas are dead
> ---
>
> Key: KAFKA-3143
> URL: https://issues.apache.org/jira/browse/KAFKA-3143
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jun Rao
>Assignee: Ismael Juma
>  Labels: reliability
> Fix For: 0.11.1.0
>
>
> This issue can be recreated in the following steps.
> 1. Start 3 brokers, 1, 2 and 3.
> 2. Create a topic with a single partition and 2 replicas, say on broker 1 and 
> 2.
> If we stop both replicas 1 and 2, depending on where the controller is, the 
> leader and isr stored in ZK in the end are different.
> If the controller is on broker 3, what's stored in ZK will be -1 for leader 
> and an empty set for ISR.
> On the other hand, if the controller is on broker 2 and we stop broker 1 
> followed by broker 2, what's stored in ZK will be 2 for leader and 2 for ISR.
> The issue is that in the first case, the controller will call 
> ReplicaStateMachine to transition to OfflineReplica, which will change the 
> leader and isr. However, in the second case, the controller fails over, but 
> we don't transition ReplicaStateMachine to OfflineReplica during controller 
> initialization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-3866) KerberosLogin refresh time bug and other improvements

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3866:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.0.0

> KerberosLogin refresh time bug and other improvements
> -
>
> Key: KAFKA-3866
> URL: https://issues.apache.org/jira/browse/KAFKA-3866
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.10.0.0
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.11.0.0
>
>
> ZOOKEEPER-2295 describes a bug in the Kerberos refresh time logic that is 
> also present in our KerberosLogin class. While looking at the code, I found a 
> number of things that could be improved. More details in the PR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-3881) Remove the replacing logic from "." to "_" in Fetcher

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3881:
---
Labels: newbie  (was: )

> Remove the replacing logic from "." to "_" in Fetcher
> -
>
> Key: KAFKA-3881
> URL: https://issues.apache.org/jira/browse/KAFKA-3881
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, metrics
>Reporter: Guozhang Wang
>Assignee: Ismael Juma
>  Labels: newbie
>
> The logic of replacing "." to "_" in metrics names / tags was originally 
> introduced in the core package's metrics since Graphite treats "." as 
> hierarchy separators (see KAFKA-1902); for the client metrics, it is supposed 
> that the GraphiteReported should take care of this itself rather than letting 
> Kafka metrics to special handle for it. In addition, right now only consumer 
> Fetcher had replace, but producer Sender does not have it actually.
> So we should consider removing this logic in the consumer Fetcher's metrics 
> package. NOTE that this is a public API backward incompatible change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-3881) Remove the replacing logic from "." to "_" in Fetcher

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-3881:
--

Assignee: (was: Ismael Juma)

> Remove the replacing logic from "." to "_" in Fetcher
> -
>
> Key: KAFKA-3881
> URL: https://issues.apache.org/jira/browse/KAFKA-3881
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, metrics
>Reporter: Guozhang Wang
>  Labels: newbie
>
> The logic of replacing "." to "_" in metrics names / tags was originally 
> introduced in the core package's metrics since Graphite treats "." as 
> hierarchy separators (see KAFKA-1902); for the client metrics, it is supposed 
> that the GraphiteReported should take care of this itself rather than letting 
> Kafka metrics to special handle for it. In addition, right now only consumer 
> Fetcher had replace, but producer Sender does not have it actually.
> So we should consider removing this logic in the consumer Fetcher's metrics 
> package. NOTE that this is a public API backward incompatible change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4503) Expose the log dir for a partition as a metric

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-4503:
--

Assignee: (was: Ismael Juma)

> Expose the log dir for a partition as a metric
> --
>
> Key: KAFKA-4503
> URL: https://issues.apache.org/jira/browse/KAFKA-4503
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Ismael Juma
>
> It would be useful to be able to map a partition to a log directory if 
> multiple log directories are used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5031) Additional validation in validateMessagesAndAssignOffsets

2017-06-03 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035884#comment-16035884
 ] 

Ismael Juma commented on KAFKA-5031:


If we don't add this validation now, we may not be able to add it later without 
breaking non compliant clients.

> Additional validation in validateMessagesAndAssignOffsets
> -
>
> Key: KAFKA-5031
> URL: https://issues.apache.org/jira/browse/KAFKA-5031
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Ismael Juma
>Priority: Critical
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> In validateMessagesAndAssignOffsets(), when validating the 
> DefaultRecordBatch, we should also validate:
> 1. Message count matches the actual number of messages in the array
> 2. The header count matches the actual number of headers



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5031) Additional validation in validateMessagesAndAssignOffsets

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5031:
---
Priority: Critical  (was: Major)

> Additional validation in validateMessagesAndAssignOffsets
> -
>
> Key: KAFKA-5031
> URL: https://issues.apache.org/jira/browse/KAFKA-5031
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Ismael Juma
>Priority: Critical
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> In validateMessagesAndAssignOffsets(), when validating the 
> DefaultRecordBatch, we should also validate:
> 1. Message count matches the actual number of messages in the array
> 2. The header count matches the actual number of headers



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5214) KafkaAdminClient#apiVersions should return a public class

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5214:
---
Priority: Blocker  (was: Major)

> KafkaAdminClient#apiVersions should return a public class
> -
>
> Key: KAFKA-5214
> URL: https://issues.apache.org/jira/browse/KAFKA-5214
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>Priority: Blocker
> Fix For: 0.11.0.0
>
>
> KafkaAdminClient#apiVersions should not refer to internal classes like 
> ApiKeys, NodeApiVersions, etc.  Instead, we should have stable public classes 
> to represent these things in the API.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5214) KafkaAdminClient#apiVersions should return a public class

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5214:
---
Fix Version/s: 0.11.0.0

> KafkaAdminClient#apiVersions should return a public class
> -
>
> Key: KAFKA-5214
> URL: https://issues.apache.org/jira/browse/KAFKA-5214
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
> Fix For: 0.11.0.0
>
>
> KafkaAdminClient#apiVersions should not refer to internal classes like 
> ApiKeys, NodeApiVersions, etc.  Instead, we should have stable public classes 
> to represent these things in the API.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5325) Connection Lose during Kafka Kerberos Renewal process

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5325:
---
Status: Patch Available  (was: Open)

> Connection Lose during Kafka Kerberos Renewal process
> -
>
> Key: KAFKA-5325
> URL: https://issues.apache.org/jira/browse/KAFKA-5325
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.0
>Reporter: MuthuKumar
>Assignee: Rajini Sivaram
> Fix For: 0.11.0.0
>
>
> During Kerberos Ticket renewal, all requests reaching the server interim 
> Kerberos renewal ticket logout & re-login  is getting failed with below 
> mentioned error.
> kafka-clients-0.9.0.0.jar is being used for producer end. Reason for using 
> Kafka version 0.9.0.0 at producer end as the server is running in 0.10.0.x 
> OS: Oracle Linux Server release 6.7
> Kerberos Configuration - Producer end
> -
> KafkaClient {
> com.sun.security.auth.module.Krb5LoginModule required
> refreshKrb5Config=true
> principal="u...@.com"
> useKeyTab=true
> serviceName="kafka"
> keyTab="x.keytab"
> client=true;
> };
> Application Log
> ---
> 2017-05-25 02:20:37,515 INF [Login.java:354] Initiating logout for 
> u...@.com
> 2017-05-25 02:20:37,515 INF [Login.java:365] Initiating re-login for 
> u...@.com
> 2017-05-25 02:20:37,525 INF [SaslChannelBuilder.java:91] Failed to create 
> channel due to
> org.apache.kafka.common.KafkaException: Failed to configure 
> SaslClientAuthenticator
> at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.configure(SaslClientAuthenticator.java:94)
> at 
> org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:88)
> at org.apache.kafka.common.network.Selector.connect(Selector.java:162)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:514)
> at 
> org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:169)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:180)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.NoSuchElementException: null
> at java.util.LinkedList$ListItr.next(LinkedList.java:890)
> at javax.security.auth.Subject$SecureSet$1.next(Subject.java:1056)
> at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.configure(SaslClientAuthenticator.java:90)
> ... 7 common frames omitted
> 2017-05-25 02:20:37,526 ERR [Sender.java:130] Uncaught error in kafka 
> producer I/O thread:
> org.apache.kafka.common.KafkaException: 
> org.apache.kafka.common.KafkaException: Failed to configure 
> SaslClientAuthenticator
> at 
> org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:92)
> at org.apache.kafka.common.network.Selector.connect(Selector.java:162)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:514)
> at 
> org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:169)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:180)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.KafkaException: Failed to configure 
> SaslClientAuthenticator
> at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.configure(SaslClientAuthenticator.java:94)
> at 
> org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:88)
> ... 6 common frames omitted
> Caused by: java.util.NoSuchElementException: null
> at java.util.LinkedList$ListItr.next(LinkedList.java:890)
> at javax.security.auth.Subject$SecureSet$1.next(Subject.java:1056)
> at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.configure(SaslClientAuthenticator.java:90)
> ... 7 common frames omitted
> 2017-05-25 02:20:37,536 ERR [Sender.java:130] Uncaught error in kafka 
> producer I/O thread:
> java.lang.NullPointerException: null
> 2017-05-25 02:20:37,536 ERR [Sender.java:130] Uncaught error in kafka 
> producer I/O thread:
> java.lang.NullPointerException: null
> 2017-05-25 02:20:37,536 ERR [Sender.java:130] Uncaught error in kafka 
> producer I/O thread:
> java.lang.NullPointerException: null



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5325) Connection Lose during Kafka Kerberos Renewal process

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5325:
---
Fix Version/s: 0.11.0.0

> Connection Lose during Kafka Kerberos Renewal process
> -
>
> Key: KAFKA-5325
> URL: https://issues.apache.org/jira/browse/KAFKA-5325
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.0
>Reporter: MuthuKumar
>Assignee: Rajini Sivaram
> Fix For: 0.11.0.0
>
>
> During Kerberos Ticket renewal, all requests reaching the server interim 
> Kerberos renewal ticket logout & re-login  is getting failed with below 
> mentioned error.
> kafka-clients-0.9.0.0.jar is being used for producer end. Reason for using 
> Kafka version 0.9.0.0 at producer end as the server is running in 0.10.0.x 
> OS: Oracle Linux Server release 6.7
> Kerberos Configuration - Producer end
> -
> KafkaClient {
> com.sun.security.auth.module.Krb5LoginModule required
> refreshKrb5Config=true
> principal="u...@.com"
> useKeyTab=true
> serviceName="kafka"
> keyTab="x.keytab"
> client=true;
> };
> Application Log
> ---
> 2017-05-25 02:20:37,515 INF [Login.java:354] Initiating logout for 
> u...@.com
> 2017-05-25 02:20:37,515 INF [Login.java:365] Initiating re-login for 
> u...@.com
> 2017-05-25 02:20:37,525 INF [SaslChannelBuilder.java:91] Failed to create 
> channel due to
> org.apache.kafka.common.KafkaException: Failed to configure 
> SaslClientAuthenticator
> at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.configure(SaslClientAuthenticator.java:94)
> at 
> org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:88)
> at org.apache.kafka.common.network.Selector.connect(Selector.java:162)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:514)
> at 
> org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:169)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:180)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.NoSuchElementException: null
> at java.util.LinkedList$ListItr.next(LinkedList.java:890)
> at javax.security.auth.Subject$SecureSet$1.next(Subject.java:1056)
> at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.configure(SaslClientAuthenticator.java:90)
> ... 7 common frames omitted
> 2017-05-25 02:20:37,526 ERR [Sender.java:130] Uncaught error in kafka 
> producer I/O thread:
> org.apache.kafka.common.KafkaException: 
> org.apache.kafka.common.KafkaException: Failed to configure 
> SaslClientAuthenticator
> at 
> org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:92)
> at org.apache.kafka.common.network.Selector.connect(Selector.java:162)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:514)
> at 
> org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:169)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:180)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.KafkaException: Failed to configure 
> SaslClientAuthenticator
> at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.configure(SaslClientAuthenticator.java:94)
> at 
> org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:88)
> ... 6 common frames omitted
> Caused by: java.util.NoSuchElementException: null
> at java.util.LinkedList$ListItr.next(LinkedList.java:890)
> at javax.security.auth.Subject$SecureSet$1.next(Subject.java:1056)
> at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.configure(SaslClientAuthenticator.java:90)
> ... 7 common frames omitted
> 2017-05-25 02:20:37,536 ERR [Sender.java:130] Uncaught error in kafka 
> producer I/O thread:
> java.lang.NullPointerException: null
> 2017-05-25 02:20:37,536 ERR [Sender.java:130] Uncaught error in kafka 
> producer I/O thread:
> java.lang.NullPointerException: null
> 2017-05-25 02:20:37,536 ERR [Sender.java:130] Uncaught error in kafka 
> producer I/O thread:
> java.lang.NullPointerException: null



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : kafka-trunk-jdk8 #1648

2017-06-03 Thread Apache Jenkins Server
See 




[jira] [Updated] (KAFKA-5355) Broker returns messages beyond "latest stable offset" to transactional consumer in read_committed mode

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5355:
---
Status: In Progress  (was: Patch Available)

> Broker returns messages beyond "latest stable offset" to transactional 
> consumer in read_committed mode
> --
>
> Key: KAFKA-5355
> URL: https://issues.apache.org/jira/browse/KAFKA-5355
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.11.0.0
>Reporter: Matthias J. Sax
>Assignee: Jason Gustafson
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: test.log
>
>
> This issue is exposed by the new Streams EOS integration test.
> Streams has two tasks (ie, two producers with {{pid}} 0 and 2000) both 
> writing to output topic {{output}} with one partition (replication factor 1).
> The test uses an transactional consumer with {{group.id=readCommitted}} to 
> read the data from {{output}} topic. When it read the data, each producer has 
> committed 10 records (one producer write messages with {{key=0}} and the 
> other with {{key=1}}). Furthermore, each producer has an open transaction and 
> 5 uncommitted records written.
> The test fails, as we expect to see 10 records per key, but we get 15 for 
> key=1:
> {noformat}
> java.lang.AssertionError: 
> Expected: <[KeyValue(1, 0), KeyValue(1, 1), KeyValue(1, 3), KeyValue(1, 6), 
> KeyValue(1, 10), KeyValue(1, 15), KeyValue(1, 21), KeyValue(1, 28), 
> KeyValue(1, 36), KeyValue(1, 45)]>
>  but: was <[KeyValue(1, 0), KeyValue(1, 1), KeyValue(1, 3), KeyValue(1, 
> 6), KeyValue(1, 10), KeyValue(1, 15), KeyValue(1, 21), KeyValue(1, 28), 
> KeyValue(1, 36), KeyValue(1, 45), KeyValue(1, 55), KeyValue(1, 66), 
> KeyValue(1, 78), KeyValue(1, 91), KeyValue(1, 105)]>
> {noformat}
> Dumping the segment shows, that there are two commit markers (one for each 
> producer) for the first 10 messages written. Furthermore, there are 5 pending 
> records. Thus, "latest stable offset" should be 21 (20 messages plus 2 commit 
> markers) and not data should be returned beyond this offset.
> Dumped Log Segment {{output-0}}
> {noformat}
> Starting offset: 0
> baseOffset: 0 lastOffset: 9 baseSequence: 0 lastSequence: 9 producerId: 0 
> producerEpoch: 6 partitionLeaderEpoch: 0 isTransactional: true position: 0 
> CreateTime: 1496255947332 isvalid: true size: 291 magic: 2 compresscodec: 
> NONE crc: 600535135
> baseOffset: 10 lastOffset: 10 baseSequence: -1 lastSequence: -1 producerId: 0 
> producerEpoch: 6 partitionLeaderEpoch: 0 isTransactional: true position: 291 
> CreateTime: 1496256005429 isvalid: true size: 78 magic: 2 compresscodec: NONE 
> crc: 3458060752
> baseOffset: 11 lastOffset: 20 baseSequence: 0 lastSequence: 9 producerId: 
> 2000 producerEpoch: 2 partitionLeaderEpoch: 0 isTransactional: true position: 
> 369 CreateTime: 1496255947322 isvalid: true size: 291 magic: 2 compresscodec: 
> NONE crc: 3392915713
> baseOffset: 21 lastOffset: 25 baseSequence: 10 lastSequence: 14 producerId: 0 
> producerEpoch: 6 partitionLeaderEpoch: 0 isTransactional: true position: 660 
> CreateTime: 1496255947342 isvalid: true size: 176 magic: 2 compresscodec: 
> NONE crc: 3513911368
> baseOffset: 26 lastOffset: 26 baseSequence: -1 lastSequence: -1 producerId: 
> 2000 producerEpoch: 2 partitionLeaderEpoch: 0 isTransactional: true position: 
> 836 CreateTime: 1496256011784 isvalid: true size: 78 magic: 2 compresscodec: 
> NONE crc: 1619151485
> {noformat}
> Dump with {{--deep-iteration}}
> {noformat}
> Starting offset: 0
> offset: 0 position: 0 CreateTime: 1496255947323 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 0 
> headerKeys: [] key: 1 payload: 0
> offset: 1 position: 0 CreateTime: 1496255947324 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 1 
> headerKeys: [] key: 1 payload: 1
> offset: 2 position: 0 CreateTime: 1496255947325 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 2 
> headerKeys: [] key: 1 payload: 3
> offset: 3 position: 0 CreateTime: 1496255947326 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 3 
> headerKeys: [] key: 1 payload: 6
> offset: 4 position: 0 CreateTime: 1496255947327 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 4 
> headerKeys: [] key: 1 payload: 10
> offset: 5 position: 0 CreateTime: 1496255947328 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 5 
> headerKeys: [] key: 1 payload: 15
> offset: 6 position: 0 CreateTime: 1496255947329 isvalid: true keysize: 8

[jira] [Updated] (KAFKA-5355) Broker returns messages beyond "latest stable offset" to transactional consumer in read_committed mode

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5355:
---
Status: Patch Available  (was: Open)

> Broker returns messages beyond "latest stable offset" to transactional 
> consumer in read_committed mode
> --
>
> Key: KAFKA-5355
> URL: https://issues.apache.org/jira/browse/KAFKA-5355
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.11.0.0
>Reporter: Matthias J. Sax
>Assignee: Jason Gustafson
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: test.log
>
>
> This issue is exposed by the new Streams EOS integration test.
> Streams has two tasks (ie, two producers with {{pid}} 0 and 2000) both 
> writing to output topic {{output}} with one partition (replication factor 1).
> The test uses an transactional consumer with {{group.id=readCommitted}} to 
> read the data from {{output}} topic. When it read the data, each producer has 
> committed 10 records (one producer write messages with {{key=0}} and the 
> other with {{key=1}}). Furthermore, each producer has an open transaction and 
> 5 uncommitted records written.
> The test fails, as we expect to see 10 records per key, but we get 15 for 
> key=1:
> {noformat}
> java.lang.AssertionError: 
> Expected: <[KeyValue(1, 0), KeyValue(1, 1), KeyValue(1, 3), KeyValue(1, 6), 
> KeyValue(1, 10), KeyValue(1, 15), KeyValue(1, 21), KeyValue(1, 28), 
> KeyValue(1, 36), KeyValue(1, 45)]>
>  but: was <[KeyValue(1, 0), KeyValue(1, 1), KeyValue(1, 3), KeyValue(1, 
> 6), KeyValue(1, 10), KeyValue(1, 15), KeyValue(1, 21), KeyValue(1, 28), 
> KeyValue(1, 36), KeyValue(1, 45), KeyValue(1, 55), KeyValue(1, 66), 
> KeyValue(1, 78), KeyValue(1, 91), KeyValue(1, 105)]>
> {noformat}
> Dumping the segment shows, that there are two commit markers (one for each 
> producer) for the first 10 messages written. Furthermore, there are 5 pending 
> records. Thus, "latest stable offset" should be 21 (20 messages plus 2 commit 
> markers) and not data should be returned beyond this offset.
> Dumped Log Segment {{output-0}}
> {noformat}
> Starting offset: 0
> baseOffset: 0 lastOffset: 9 baseSequence: 0 lastSequence: 9 producerId: 0 
> producerEpoch: 6 partitionLeaderEpoch: 0 isTransactional: true position: 0 
> CreateTime: 1496255947332 isvalid: true size: 291 magic: 2 compresscodec: 
> NONE crc: 600535135
> baseOffset: 10 lastOffset: 10 baseSequence: -1 lastSequence: -1 producerId: 0 
> producerEpoch: 6 partitionLeaderEpoch: 0 isTransactional: true position: 291 
> CreateTime: 1496256005429 isvalid: true size: 78 magic: 2 compresscodec: NONE 
> crc: 3458060752
> baseOffset: 11 lastOffset: 20 baseSequence: 0 lastSequence: 9 producerId: 
> 2000 producerEpoch: 2 partitionLeaderEpoch: 0 isTransactional: true position: 
> 369 CreateTime: 1496255947322 isvalid: true size: 291 magic: 2 compresscodec: 
> NONE crc: 3392915713
> baseOffset: 21 lastOffset: 25 baseSequence: 10 lastSequence: 14 producerId: 0 
> producerEpoch: 6 partitionLeaderEpoch: 0 isTransactional: true position: 660 
> CreateTime: 1496255947342 isvalid: true size: 176 magic: 2 compresscodec: 
> NONE crc: 3513911368
> baseOffset: 26 lastOffset: 26 baseSequence: -1 lastSequence: -1 producerId: 
> 2000 producerEpoch: 2 partitionLeaderEpoch: 0 isTransactional: true position: 
> 836 CreateTime: 1496256011784 isvalid: true size: 78 magic: 2 compresscodec: 
> NONE crc: 1619151485
> {noformat}
> Dump with {{--deep-iteration}}
> {noformat}
> Starting offset: 0
> offset: 0 position: 0 CreateTime: 1496255947323 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 0 
> headerKeys: [] key: 1 payload: 0
> offset: 1 position: 0 CreateTime: 1496255947324 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 1 
> headerKeys: [] key: 1 payload: 1
> offset: 2 position: 0 CreateTime: 1496255947325 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 2 
> headerKeys: [] key: 1 payload: 3
> offset: 3 position: 0 CreateTime: 1496255947326 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 3 
> headerKeys: [] key: 1 payload: 6
> offset: 4 position: 0 CreateTime: 1496255947327 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 4 
> headerKeys: [] key: 1 payload: 10
> offset: 5 position: 0 CreateTime: 1496255947328 isvalid: true keysize: 8 
> valuesize: 8 magic: 2 compresscodec: NONE crc: 600535135 sequence: 5 
> headerKeys: [] key: 1 payload: 15
> offset: 6 position: 0 CreateTime: 1496255947329 isvalid: true keysize: 8 
> val

[jira] [Commented] (KAFKA-5347) OutOfSequence error should be fatal

2017-06-03 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035883#comment-16035883
 ] 

Ismael Juma commented on KAFKA-5347:


[~apurva] [~hachikuji] This issue has no "Fix version". Is it something for 
0.11.0.0, 0.11.0.1 or 0.11.1.0?

> OutOfSequence error should be fatal
> ---
>
> Key: KAFKA-5347
> URL: https://issues.apache.org/jira/browse/KAFKA-5347
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>Assignee: Apurva Mehta
>  Labels: exactly-once
>
> If the producer sees an OutOfSequence error for a given partition, we 
> currently treat it as an abortable error. This makes some sense because 
> OutOfSequence won't prevent us from being able to send the EndTxn to abort 
> the transaction. The problem is that the producer, even after aborting, still 
> won't be able to send to the topic with an OutOfSequence. One way to deal 
> with this is to ask the user to call {{initTransactions()}} again to bump the 
> epoch, but this is a bit difficult to explain and could be dangerous since it 
> renders zombie checking less effective. Probably we should just consider 
> OutOfSequence fatal for the transactional producer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4815) Idempotent/transactional Producer Checklist (KIP-98)

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4815:
---
Fix Version/s: (was: 0.11.1.0)
   0.11.0.0

> Idempotent/transactional Producer Checklist (KIP-98)
> 
>
> Key: KAFKA-4815
> URL: https://issues.apache.org/jira/browse/KAFKA-4815
> Project: Kafka
>  Issue Type: New Feature
>  Components: clients, core, producer 
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>  Labels: kip
> Fix For: 0.11.0.0
>
>
> This issue tracks implementation progress for KIP-98: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4815) Idempotent/transactional Producer Checklist (KIP-98)

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4815:
---
Fix Version/s: (was: 0.11.0.0)
   0.11.1.0

> Idempotent/transactional Producer Checklist (KIP-98)
> 
>
> Key: KAFKA-4815
> URL: https://issues.apache.org/jira/browse/KAFKA-4815
> Project: Kafka
>  Issue Type: New Feature
>  Components: clients, core, producer 
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>  Labels: kip
> Fix For: 0.11.1.0
>
>
> This issue tracks implementation progress for KIP-98: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5020) Update protocol documentation to mention message format v2

2017-06-03 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035878#comment-16035878
 ] 

Ismael Juma commented on KAFKA-5020:


I reduced the priority to "Critical" because this can be added after the 
release goes out. It's a website update for client developers and doesn't need 
to block the release.

> Update protocol documentation to mention message format v2
> --
>
> Key: KAFKA-5020
> URL: https://issues.apache.org/jira/browse/KAFKA-5020
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Ismael Juma
>Priority: Critical
>  Labels: documentation, exactly-once
> Fix For: 0.11.0.0
>
>
> Sections 5.3, 5.4 and 5.5 should be updated:
> https://kafka.apache.org/documentation/#messages
> We may want to mention record batches along with message sets here:
> https://kafka.apache.org/protocol#protocol_message_sets
> And we should update the wiki page linked from the protocol documentation:
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5021) Update Message Delivery Semantics section to take into account KIP-98

2017-06-03 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035879#comment-16035879
 ] 

Ismael Juma commented on KAFKA-5021:


Since this is a website update, it doesn't have to block the release.

> Update Message Delivery Semantics section to take into account KIP-98
> -
>
> Key: KAFKA-5021
> URL: https://issues.apache.org/jira/browse/KAFKA-5021
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Ismael Juma
>Priority: Critical
>  Labels: documentation, exactly-once
> Fix For: 0.11.0.0
>
>
> Reference:
> https://kafka.apache.org/documentation/#semantics



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5021) Update Message Delivery Semantics section to take into account KIP-98

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5021:
---
Priority: Critical  (was: Blocker)

> Update Message Delivery Semantics section to take into account KIP-98
> -
>
> Key: KAFKA-5021
> URL: https://issues.apache.org/jira/browse/KAFKA-5021
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Ismael Juma
>Priority: Critical
>  Labels: documentation, exactly-once
> Fix For: 0.11.0.0
>
>
> Reference:
> https://kafka.apache.org/documentation/#semantics



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5020) Update protocol documentation to mention message format v2

2017-06-03 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5020:
---
Priority: Critical  (was: Blocker)

> Update protocol documentation to mention message format v2
> --
>
> Key: KAFKA-5020
> URL: https://issues.apache.org/jira/browse/KAFKA-5020
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Ismael Juma
>Priority: Critical
>  Labels: documentation, exactly-once
> Fix For: 0.11.0.0
>
>
> Sections 5.3, 5.4 and 5.5 should be updated:
> https://kafka.apache.org/documentation/#messages
> We may want to mention record batches along with message sets here:
> https://kafka.apache.org/protocol#protocol_message_sets
> And we should update the wiki page linked from the protocol documentation:
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)