[GitHub] kafka pull request #3365: MINOR: Unused logger removed.

2017-06-16 Thread Kamal15
GitHub user Kamal15 opened a pull request:

https://github.com/apache/kafka/pull/3365

MINOR: Unused logger removed.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Kamal15/kafka logger

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3365.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3365


commit 5472063b733885e185c522869f29b8fbdaceea7b
Author: Kamal C 
Date:   2017-06-17T05:34:47Z

MINOR: Unused logger removed.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3364: HOTFIX: Use ResourceType.toJava instead of Resourc...

2017-06-16 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/3364

HOTFIX: Use ResourceType.toJava instead of ResourceType.fromString

The latter doesn't work for TransactionalId (or any type with two camel-case
words).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka acls-fromString-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3364.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3364


commit 346dc32e26ab3a48fae6f112e427d45546607fa8
Author: Ismael Juma 
Date:   2017-06-17T05:08:20Z

HOTFIX: Use ResourceType.toJava instead of ResourceType.fromString

The latter doesn't work for TransactionalId (or any type with two camel-case
words).




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3363: MINOR: Some small cleanups/improvements to KAFKA-5...

2017-06-16 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3363

MINOR: Some small cleanups/improvements to KAFKA-5031



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-5031-FOLLOWUP

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3363


commit 5bbfff43d5ab908e88811c61c9b1f86221b8cf6c
Author: Jason Gustafson 
Date:   2017-06-17T04:37:56Z

MINOR: Some small cleanups/improvements to KAFKA-5031




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3156: KAFKA-5031: validate count of records for DefaultR...

2017-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3156


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3362: MINOR: StreamThread standbyTask comment typo

2017-06-16 Thread zqhxuyuan
GitHub user zqhxuyuan opened a pull request:

https://github.com/apache/kafka/pull/3362

MINOR: StreamThread standbyTask comment typo

log info typos. should be standby task, not active task.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zqhxuyuan/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3362.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3362


commit c8d3cb48571804f1b9808e7096baf01cd90d5089
Author: zqhxuyuan 
Date:   2017-06-17T01:41:11Z

MINOR: StreamThread standbyTask comment typo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-trunk-jdk8 #1719

2017-06-16 Thread Apache Jenkins Server
See 




[GitHub] kafka pull request #3171: MINOR: A few cleanups in KafkaApis and Transaction...

2017-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3171


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3361: KAFKA-5435: Improve producer state loading after f...

2017-06-16 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3361

KAFKA-5435: Improve producer state loading after failure



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-5435-ALT

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3361.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3361


commit 5bf7d189589e4986e9552c955d101fe8e352d49c
Author: Jason Gustafson 
Date:   2017-06-17T00:25:12Z

KAFKA-5435: Improve producer state loading after failure




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3306: KAFKA-5435: Ensure producer snapshot retained afte...

2017-06-16 Thread hachikuji
Github user hachikuji closed the pull request at:

https://github.com/apache/kafka/pull/3306


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3360: KAFKA-5020: Update message format in implementatio...

2017-06-16 Thread apurvam
GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/3360

KAFKA-5020: Update message format in implementation docs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka 
KAFKA-5020-message-format-docs-update

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3360.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3360


commit 35e7f55d5067aa503467eb84f5de12fb1ea33b3f
Author: Apurva Mehta 
Date:   2017-06-17T00:10:57Z

Update message format in implementation docs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5464) StreamsKafkaClient should not use StreamsConfig.POLL_MS_CONFIG

2017-06-16 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-5464:
--

 Summary: StreamsKafkaClient should not use 
StreamsConfig.POLL_MS_CONFIG
 Key: KAFKA-5464
 URL: https://issues.apache.org/jira/browse/KAFKA-5464
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.2.1, 0.11.0.0
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax


In {{StreamsKafkaClient}} we use an {{NetworkClient}} internally and call 
{{poll}} using {{StreamsConfig.POLL_MS_CONFIG}} as timeout.

However, {{StreamsConfig.POLL_MS_CONFIG}} is solely meant to be applied to 
{{KafkaConsumer.poll()}} and it's incorrect to use it for the 
{{NetworkClient}}. If the config is increased, this can lead to a infinite 
rebalance and rebalance on the client side is increased and thus, the client is 
not able to meet broker enforced timeouts anymore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] KIP-160: Augment KStream.print() to allow users pass in extra parameters in the printed string

2017-06-16 Thread Guozhang Wang
Yes. Please update the wiki page of KIP-160 as well, and then recall the
voting process.

For those who already voted, please re-review the wiki page and re-cast
your vote.

Guozhang

On Thu, Jun 15, 2017 at 7:08 PM, James Chain 
wrote:

> @Guozhang
>
> OK, I will add it on writeAsText as well.
> Should I add writeAsText part in KIP-160 ?
>
> James Chien
>



-- 
-- Guozhang


Build failed in Jenkins: kafka-trunk-jdk8 #1718

2017-06-16 Thread Apache Jenkins Server
See 


Changes:

[ismael] MINOR: clarify partition option behavior for console consumer

--
[...truncated 4.10 MB...]
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:596)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:642)
at 
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
ERROR: Could not install GRADLE_3_4_RC_2_HOME
java.lang.NullPointerException
at 
hudson.plugins.toolenv.ToolEnvBuildWrapper$1.buildEnvVars(ToolEnvBuildWrapper.java:46)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:930)
at hudson.plugins.git.GitSCM.getParamExpandedRepos(GitSCM.java:418)
at 
hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:624)
at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:589)
at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:392)
at hudson.scm.SCM.poll(SCM.java:409)
at hudson.model.AbstractProject._poll(AbstractProject.java:1463)
at hudson.model.AbstractProject.poll(AbstractProject.java:1366)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:596)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:642)
at 
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
ERROR: Could not install GRADLE_3_4_RC_2_HOME
java.lang.NullPointerException
at 
hudson.plugins.toolenv.ToolEnvBuildWrapper$1.buildEnvVars(ToolEnvBuildWrapper.java:46)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:930)
at hudson.plugins.git.GitSCM.getParamExpandedRepos(GitSCM.java:418)
at 
hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:624)
at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:589)
at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:392)
at hudson.scm.SCM.poll(SCM.java:409)
at hudson.model.AbstractProject._poll(AbstractProject.java:1463)
at hudson.model.AbstractProject.poll(AbstractProject.java:1366)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:596)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:642)
at 
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
ERROR: Could not install GRADLE_3_4_RC_2_HOME
java.lang.NullPointerException
at 
hudson.plugins.toolenv.ToolEnvBuildWrapper$1.buildEnvVars(ToolEnvBuildWrapper.java:46)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:930)
at hudson.plugins.git.GitSCM.getParamExpandedRepos(GitSCM.java:418)
at 
hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:624)
at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:589)
at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:392)
at hudson.scm.SCM.poll(SCM.java:409)
at hudson.model.AbstractProject._poll(AbstractProject.java:1463)
at hudson.model.AbstractProject.poll(AbstractProject.java:1366)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:596)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:642)
at 
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(T

[GitHub] kafka pull request #3338: KAFKA-5446: Annoying braces showed on log.error us...

2017-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3338


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3359: KAFKA-5062. Kafka brokers can accept malformed req...

2017-06-16 Thread cmccabe
GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/3359

KAFKA-5062. Kafka brokers can accept malformed requests which allocat…

…e gigabytes of memory

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-5062

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3359.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3359


commit 503cfc9ae5f444a46e28b2b5eac6d31b4628134a
Author: Colin P. Mccabe 
Date:   2017-06-14T16:52:08Z

KAFKA-5062. Kafka brokers can accept malformed requests which allocate 
gigabytes of memory




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3358: KAFKA-5463: Controller incorrectly logs rack infor...

2017-06-16 Thread jeffchao
GitHub user jeffchao opened a pull request:

https://github.com/apache/kafka/pull/3358

KAFKA-5463: Controller incorrectly logs rack information when new brokers 
are added

This change is in response to 
https://issues.apache.org/jira/browse/KAFKA-5463.

When a new broker is added, the `ControllerChannelManager` doesn't log rack 
information even if configured.

This happens because `ControllerChannelManager` always instantiates a 
`Node` using the same constructor whether or not rack-aware is configured and 
causes some confusion when running with rack-aware replica placement.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heroku/kafka fix-controller-rack-aware-logging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3358.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3358


commit 9259e141d3655fe58c6e10b22b442a86b5ab8158
Author: Jeff Chao 
Date:   2017-06-16T21:46:21Z

Include rack information on UpdateMetadata request.

When a new broker is added, the ControllerChannelManager doesn't log
rack information even if configured.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5463) Controller incorrectly logs rack information when new brokers are added

2017-06-16 Thread Jeff Chao (JIRA)
Jeff Chao created KAFKA-5463:


 Summary: Controller incorrectly logs rack information when new 
brokers are added
 Key: KAFKA-5463
 URL: https://issues.apache.org/jira/browse/KAFKA-5463
 Project: Kafka
  Issue Type: Bug
  Components: config, controller
Affects Versions: 0.10.2.1, 0.10.2.0, 0.11.0.0
 Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8
Reporter: Jeff Chao
Priority: Minor


When a new broker is added, on an {{UpdateMetadata request}}, rack information 
won't be present in the state-change log even if configured.

Example:

{{pri=TRACE t=Controller-1-to-broker-0-send-thread at=logger Controller 1 epoch 
1 received response {error_code=0} for a request sent to broker : 
(id: 0 rack: null)}}

This happens because {{ControllerChannelManager}} always instantiates a 
{{Node}} using the same constructor whether or not rack-aware is configured. 
We're happy to contribute a patch since this causes some confusion when running 
with rack-aware replica placement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3227: MINOR: fix quotes for consistent rendering

2017-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3227


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3228: MINOR: (docs) in section, not at section

2017-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3228


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3356: KAFKA-5456: Ensure producer handles old format lar...

2017-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3356


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3357: KAFKA-5413: Log cleaner fails due to large offset ...

2017-06-16 Thread kelvinrutt
GitHub user kelvinrutt opened a pull request:

https://github.com/apache/kafka/pull/3357

KAFKA-5413: Log cleaner fails due to large offset in segment file

the contribution is my original work and I license the work to the project 
under the project's open source license.

@junrao , I had already made the code change before your last comment.  
I've done pretty much what you said, except that I've not used the current 
segment because I wasn't sure if it will always be available.
I'm happy to change it if you prefer.
I've run all the unit and integration tests which all passed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kelvinrutt/kafka kafka_5413_bugfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3357.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3357


commit 85d7024ea6bc67328a59b5cae3078fd4138df33f
Author: Kelvin Rutt 
Date:   2017-06-15T14:34:04Z

Fix problem with log cleaner where index under reports

commit 1adcc8a04af56a02efbeb90874e0545036c1a316
Author: Kelvin Rutt 
Date:   2017-06-15T15:34:28Z

Add comments and remove some unwanted code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


to be added the contributor list

2017-06-16 Thread Qingtian Wang
Hello,

Not sure if this is right list to send my request

I'd like to start contributing by working on simple JIRA issues as
suggested by the Kafka website. Per the instruction:

   - Please contact us to be added the contributor list

I'd like to be added to the contribute list, but not sure how.

Sorry if this has been asked for many times. I am new; please help.

Thanks,
Qingtian


Re: Kafka Streams vs Spark Streaming : reduce by window

2017-06-16 Thread Matthias J. Sax
Thanks Michał!

That is very good feedback.


-Matthias

On 6/16/17 2:38 AM, Michal Borowiecki wrote:
> I wonder if it's a frequent enough use case that Kafka Streams should
> consider providing this out of the box - this was asked for multiple
> times, right?
> 
> Personally, I agree totally with the philosophy of "no final
> aggregation", as expressed by Eno's post, but IMO that is predicated
> totally on event-time semantics.
> 
> If users want processing-time semantics then, as the docs already point
> out, there is no such thing as a late-arriving record - every record
> just falls in the currently open window(s), hence the notion of final
> aggregation makes perfect sense, from the usability point of view.
> 
> The single abstraction of "stream time" proves leaky in some cases (e.g.
> for punctuate method - being addressed in KIP-138). Perhaps this is
> another case where processing-time semantics warrant explicit handling
> in the api - but of course, only if there's sufficient user demand for this.
> 
> What I could imagine is a new type of time window
> (ProcessingTimeWindow?), that if used in an aggregation, the underlying
> processor would force the WallclockTimestampExtractor (KAFKA-4144
> enables that) and would use the system-time punctuation (KIP-138) to
> send the final aggregation value once the window has expired and could
> be configured to not send intermediate updates while the window was open.
> 
> Of course this is just a helper for the users, since they can implement
> it all themselves using the low-level API, as Matthias pointed out
> already. Just seems there's recurring interest in this.
> 
> Again, this only makes sense for processing time semantics. For
> event-time semantics I find the arguments for "no final aggregation"
> totally convincing.
> 
> 
> Cheers,
> 
> Michał
> 
> 
> On 16/06/17 00:08, Matthias J. Sax wrote:
>> Hi Paolo,
>>
>> This SO question might help, too:
>> https://stackoverflow.com/questions/38935904/how-to-send-final-kafka-streams-aggregation-result-of-a-time-windowed-ktable
>>
>> For Streams, the basic model is based on "change" and we report updates
>> to the "current" result immediately reducing latency to a minimum.
>>
>> Last, if you say it's going to fall into the next window, you won't get
>> event time semantics but you fall back processing time semantics, that
>> cannot provide exact results
>>
>> If you really want to trade-off correctness version getting (late)
>> updates and want to use processing time semantics, you should configure
>> WallclockTimestampExtractor and implement a "update deduplication"
>> operator using table.toStream().transform(). You can attached a state to
>> your transformer and store all update there (ie, newer update overwrite
>> older updates). Punctuations allow you to emit "final" results for
>> windows for which "window end time" passed.
>>
>>
>> -Matthias
>>
>> On 6/15/17 9:21 AM, Paolo Patierno wrote:
>>> Hi Eno,
>>>
>>>
>>> regarding closing window I think that it's up to the streaming application. 
>>> I mean ...
>>>
>>> If I want something like I described, I know that a value outside my 5 
>>> seconds window will be taken into account for the next processing (in the 
>>> next 5 seconds). I don't think I'm losing a record, I am ware that this 
>>> record will fall in the next "processing" window. Btw I'll take a look at 
>>> your article ! Thanks !
>>>
>>>
>>> Paolo
>>>
>>>
>>> Paolo Patierno
>>> Senior Software Engineer (IoT) @ Red Hat
>>> Microsoft MVP on Windows Embedded & IoT
>>> Microsoft Azure Advisor
>>>
>>> Twitter : @ppatierno
>>> Linkedin : paolopatierno
>>> Blog : DevExperience
>>>
>>>
>>> 
>>> From: Eno Thereska 
>>> Sent: Thursday, June 15, 2017 3:57 PM
>>> To: us...@kafka.apache.org
>>> Subject: Re: Kafka Streams vs Spark Streaming : reduce by window
>>>
>>> Hi Paolo,
>>>
>>> Yeah, so if you want fewer records, you should actually "not" disable 
>>> cache. If you disable cache you'll get all the records as you described.
>>>
>>> About closing windows: if you close a window and a late record arrives that 
>>> should have been in that window, you basically lose the ability to process 
>>> that record. In Kafka Streams we are robust to that, in that we handle late 
>>> arriving records. There is a comparison here for example when we compare it 
>>> to other methods that depend on watermarks or triggers: 
>>> https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/ 
>>> 
>>>
>>> Eno
>>>
>>>
 On 15 Jun 2017, at 14:57, Paolo Patierno  wrote:

 Hi Emo,


 thanks for the reply !

 Regarding the cache I'm already using CACHE_MAX_BYTES_BUFFERING_CONFIG = 0 
 (so disabling cache).

 Regarding the interactive query API (I'll take a l

Re: Improving tools: --help

2017-06-16 Thread Tom Bentley
> I just checked kafka-topics.sh and the only required argument there is
> --zookeeper option. Not sure if you were thinking of some other command.
>
> I meant that for kafka-topics.sh --create requires --topic (amongst
others), but --list does not.

(For example, using these methods we can say something like:
> - A.requiredIf("B") the given option 'A' requires option 'B'
> - A.requiredUnless("B") the given option 'A' needs to be present if option
> 'B' is not present)
>
> I wasn't able to get that to work for that command. I am sure I am missing
> something but will keep working on it.
>
That's a good start if you can get it working. If the fact that A is
required when B is present is only part of the documentation for A, it
doesn't help me when I scan the documentation of B. But maybe JOptSimple
would document this on B?

imho the best way to deal with this sort of situation is with multiple
usages, see for example `git branch --help` -- it's clear an concise which
options are supported/required with which 'actions'.

Cheers,

Tom


Re: Kafka Streams vs Spark Streaming : reduce by window

2017-06-16 Thread Jay Kreps
I think the question is when do you actually *want* processing time
semantics? There are definitely times when its safe to assume the two are
close enough that a little lossiness doesn't matter much but it is pretty
hard to make assumptions about when the processing time is and has been
hard for us to think of a use case where its actually desirable.

I think mostly what we've seen is confusion about the core concepts:

   - stream -- immutable events that occur
   - tables (including windows) -- current state of the world

If the root problem is confusion adding knobs never makes it better. If the
root problem is we're missing important use cases that justify the
additional knobs then i think it's good to try to really understand them. I
think there could be use cases around systems that don't take updates,
example would be email, twitter, and some metrics stores.

One solution that would be less complexity inducing than allowing new
semantics, but might help with the use cases we need to collect, would be
to add a new operator in the DSL. Something like .freezeAfter(30,
TimeUnit.SECONDS) that collects all updates for a given window and both
emits and enforces a single output after 30 seconds after the advancement
of stream time and remembers that it is omitted, suppressing all further
output (so the output is actually a KStream). This might or might not
depend on wall clock time. Perhaps this is in fact what you are proposing?

-Jay



On Fri, Jun 16, 2017 at 2:38 AM, Michal Borowiecki <
michal.borowie...@openbet.com> wrote:

> I wonder if it's a frequent enough use case that Kafka Streams should
> consider providing this out of the box - this was asked for multiple times,
> right?
>
> Personally, I agree totally with the philosophy of "no final aggregation",
> as expressed by Eno's post, but IMO that is predicated totally on
> event-time semantics.
>
> If users want processing-time semantics then, as the docs already point
> out, there is no such thing as a late-arriving record - every record just
> falls in the currently open window(s), hence the notion of final
> aggregation makes perfect sense, from the usability point of view.
>
> The single abstraction of "stream time" proves leaky in some cases (e.g.
> for punctuate method - being addressed in KIP-138). Perhaps this is another
> case where processing-time semantics warrant explicit handling in the api -
> but of course, only if there's sufficient user demand for this.
>
> What I could imagine is a new type of time window (ProcessingTimeWindow?),
> that if used in an aggregation, the underlying processor would force the
> WallclockTimestampExtractor (KAFKA-4144 enables that) and would use the
> system-time punctuation (KIP-138) to send the final aggregation value once
> the window has expired and could be configured to not send intermediate
> updates while the window was open.
>
> Of course this is just a helper for the users, since they can implement it
> all themselves using the low-level API, as Matthias pointed out already.
> Just seems there's recurring interest in this.
>
> Again, this only makes sense for processing time semantics. For event-time
> semantics I find the arguments for "no final aggregation" totally
> convincing.
>
>
> Cheers,
>
> Michał
>
> On 16/06/17 00:08, Matthias J. Sax wrote:
>
> Hi Paolo,
>
> This SO question might help, 
> too:https://stackoverflow.com/questions/38935904/how-to-send-final-kafka-streams-aggregation-result-of-a-time-windowed-ktable
>
> For Streams, the basic model is based on "change" and we report updates
> to the "current" result immediately reducing latency to a minimum.
>
> Last, if you say it's going to fall into the next window, you won't get
> event time semantics but you fall back processing time semantics, that
> cannot provide exact results
>
> If you really want to trade-off correctness version getting (late)
> updates and want to use processing time semantics, you should configure
> WallclockTimestampExtractor and implement a "update deduplication"
> operator using table.toStream().transform(). You can attached a state to
> your transformer and store all update there (ie, newer update overwrite
> older updates). Punctuations allow you to emit "final" results for
> windows for which "window end time" passed.
>
>
> -Matthias
>
> On 6/15/17 9:21 AM, Paolo Patierno wrote:
>
> Hi Eno,
>
>
> regarding closing window I think that it's up to the streaming application. I 
> mean ...
>
> If I want something like I described, I know that a value outside my 5 
> seconds window will be taken into account for the next processing (in the 
> next 5 seconds). I don't think I'm losing a record, I am ware that this 
> record will fall in the next "processing" window. Btw I'll take a look at 
> your article ! Thanks !
>
>
> Paolo
>
>
> Paolo Patierno
> Senior Software Engineer (IoT) @ Red Hat
> Microsoft MVP on Windows Embedded & IoT
> Microsoft Azure Advisor
>
> Twitter : @ppatierno

Re: Kafka Streams vs Spark Streaming : reduce by window

2017-06-16 Thread Jay Kreps
I think the question is when do you actually *want* processing time
semantics? There are definitely times when its safe to assume the two are
close enough that a little lossiness doesn't matter much but it is pretty
hard to make assumptions about when the processing time is and has been
hard for us to think of a use case where its actually desirable.

I think mostly what we've seen is confusion about the core concepts:

   - stream -- immutable events that occur
   - tables (including windows) -- current state of the world

If the root problem is confusion adding knobs never makes it better. If the
root problem is we're missing important use cases that justify the
additional knobs then i think it's good to try to really understand them. I
think there could be use cases around systems that don't take updates,
example would be email, twitter, and some metrics stores.

One solution that would be less complexity inducing than allowing new
semantics, but might help with the use cases we need to collect, would be
to add a new operator in the DSL. Something like .freezeAfter(30,
TimeUnit.SECONDS) that collects all updates for a given window and both
emits and enforces a single output after 30 seconds after the advancement
of stream time and remembers that it is omitted, suppressing all further
output (so the output is actually a KStream). This might or might not
depend on wall clock time. Perhaps this is in fact what you are proposing?

-Jay



On Fri, Jun 16, 2017 at 2:38 AM, Michal Borowiecki <
michal.borowie...@openbet.com> wrote:

> I wonder if it's a frequent enough use case that Kafka Streams should
> consider providing this out of the box - this was asked for multiple times,
> right?
>
> Personally, I agree totally with the philosophy of "no final aggregation",
> as expressed by Eno's post, but IMO that is predicated totally on
> event-time semantics.
>
> If users want processing-time semantics then, as the docs already point
> out, there is no such thing as a late-arriving record - every record just
> falls in the currently open window(s), hence the notion of final
> aggregation makes perfect sense, from the usability point of view.
>
> The single abstraction of "stream time" proves leaky in some cases (e.g.
> for punctuate method - being addressed in KIP-138). Perhaps this is another
> case where processing-time semantics warrant explicit handling in the api -
> but of course, only if there's sufficient user demand for this.
>
> What I could imagine is a new type of time window (ProcessingTimeWindow?),
> that if used in an aggregation, the underlying processor would force the
> WallclockTimestampExtractor (KAFKA-4144 enables that) and would use the
> system-time punctuation (KIP-138) to send the final aggregation value once
> the window has expired and could be configured to not send intermediate
> updates while the window was open.
>
> Of course this is just a helper for the users, since they can implement it
> all themselves using the low-level API, as Matthias pointed out already.
> Just seems there's recurring interest in this.
>
> Again, this only makes sense for processing time semantics. For event-time
> semantics I find the arguments for "no final aggregation" totally
> convincing.
>
>
> Cheers,
>
> Michał
>
> On 16/06/17 00:08, Matthias J. Sax wrote:
>
> Hi Paolo,
>
> This SO question might help, 
> too:https://stackoverflow.com/questions/38935904/how-to-send-final-kafka-streams-aggregation-result-of-a-time-windowed-ktable
>
> For Streams, the basic model is based on "change" and we report updates
> to the "current" result immediately reducing latency to a minimum.
>
> Last, if you say it's going to fall into the next window, you won't get
> event time semantics but you fall back processing time semantics, that
> cannot provide exact results
>
> If you really want to trade-off correctness version getting (late)
> updates and want to use processing time semantics, you should configure
> WallclockTimestampExtractor and implement a "update deduplication"
> operator using table.toStream().transform(). You can attached a state to
> your transformer and store all update there (ie, newer update overwrite
> older updates). Punctuations allow you to emit "final" results for
> windows for which "window end time" passed.
>
>
> -Matthias
>
> On 6/15/17 9:21 AM, Paolo Patierno wrote:
>
> Hi Eno,
>
>
> regarding closing window I think that it's up to the streaming application. I 
> mean ...
>
> If I want something like I described, I know that a value outside my 5 
> seconds window will be taken into account for the next processing (in the 
> next 5 seconds). I don't think I'm losing a record, I am ware that this 
> record will fall in the next "processing" window. Btw I'll take a look at 
> your article ! Thanks !
>
>
> Paolo
>
>
> Paolo Patierno
> Senior Software Engineer (IoT) @ Red Hat
> Microsoft MVP on Windows Embedded & IoT
> Microsoft Azure Advisor
>
> Twitter : @ppatierno

Re: Improving tools: --help

2017-06-16 Thread Mariam John

Hi Tom,

 I just checked kafka-topics.sh and the only required argument there is
--zookeeper option. Not sure if you were thinking of some other command.

But I understand what you mean. One of the places where I had some trouble
with something related is the kafka-console-consumer command where the
required options are based on whether we are using the new consumer or old
consumer. In JOptSimple there is a way to define "Required Dependent
Options" using the requiredIf or requiredUnless methods

(For example, using these methods we can say something like:
- A.requiredIf("B") the given option 'A' requires option 'B'
- A.requiredUnless("B") the given option 'A' needs to be present if option
'B' is not present)

I wasn't able to get that to work for that command. I am sure I am missing
something but will keep working on it.

Hope this helps.

Regards,
Mariam.




From:   Tom Bentley 
To: dev@kafka.apache.org
Date:   06/16/2017 10:32 AM
Subject:Re: Improving tools: --help



That's a great start Mariam, thanks.

Out of interest, how do you handle required arguments for something like
kafka-topics.sh where what's required depends on the 'action' option?

Cheers,

Tom

On 16 June 2017 at 15:55, Mariam John  wrote:

> Hi Tom,
>
> I just wanted to let you know that I am currently working on KAFKA-2111:
> https://issues.apache.org/jira/browse/KAFKA-2111 and am almost done with
> it. The changes that are included in this patch are:
>
> - add --help option to all the commands where it is missing
> - correctly mark all the required options. I noticed that currently we
> rely on the user to add the REQUIRED wording in the option description. I
> have changed this to use the JOptSimple method required() to mark options
> as required. The other good thing about doing it that way is that when
the
> parse method is called, it will automatically check if user has specified
> the required arguments (Right now we rely on
CommandLineUtils.checkRequiredArg
> to do that)
> - Made some minor grammar changes to the option descriptions on some of
> the commands (like ending the description with ., Starting a description
> with caps, some wording changes)
> - Some commands did not have description of what it does - added those
>
>
> I am also looking at https://issues.apache.org/jira/browse/KAFKA-4220
> since it is related to this. So I am currently trying to include this
> changes in the current patchset for KAFKA-2111. It will include:
> - changes to properly catch OptionExceptions when converting the argument
> to different types and print out the error message and command usage
> information rather than throwing a stacktrace exception.
>
> There are close to 20 commands I have made changes to. I will try to push
> something out by end of day today. So I think I will cover 1) from your
> list below and not 2) and 3).
>
> Thanks Tom.
>
> Regards,
> Mariam.
>
>
>
> [image: Inactive hide details for Tom Bentley ---06/16/2017 08:06:11
> AM---Hi, I noticed that the command line tools could use a little]Tom
> Bentley ---06/16/2017 08:06:11 AM---Hi, I noticed that the command line
> tools could use a little love. For
>
> From: Tom Bentley 
> To: dev@kafka.apache.org
> Date: 06/16/2017 08:06 AM
> Subject: Improving tools: --help
> --
>
>
>
> Hi,
>
> I noticed that the command line tools could use a little love. For
> instance, I was surprised that most of them don't support `--help`, and
> generally there are a few inconsistencies.
>
> KIP-14 is dormant and AFAICS no one is working on
> https://issues.apache.org/jira/browse/KAFKA-2111 either. So if I'm not
> treading on anyone's toes, I would like to improve this situation.
> Specifically I propose to:
>
> 1. Add support for `--help` for all tools with shell scripts.
> 2. Include in the help output on non-Windows systems documentation for
the
> script-implemented `--daemon`, `--name` and `--loggc` options, because
> currently they're not very discoverable.
> 3. Add more standard equivalents for non-UNIXish options (e.g. add
> `--consumer-config` as well as `--consumer.config`).
>
> I also noticed that overall Kafka has two option parsing dependencies:
> `argparse4j` and `jopt-simple`. Is there a good reason for this? If not
> then I could standardize on one at the same time.
>
> Would this require a KIP? If so should I open a new one, or resurrect
> KIP-14?
>
> Regards,
>
> Tom
>
>
>



[jira] [Created] (KAFKA-5462) Add a configuration for users to specify a template for building a custom principal name

2017-06-16 Thread Koelli Mungee (JIRA)
Koelli Mungee created KAFKA-5462:


 Summary: Add a configuration for users to specify a template for 
building a custom principal name
 Key: KAFKA-5462
 URL: https://issues.apache.org/jira/browse/KAFKA-5462
 Project: Kafka
  Issue Type: Bug
  Components: security
Affects Versions: 0.10.2.1
Reporter: Koelli Mungee


Add a configuration for users to specify a template for building a custom 
principal name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[DISCUSS] KIP-168: Add TotalTopicCount metric per cluster

2017-06-16 Thread Abhishek Mendhekar
Hi Kafka Dev,

I created KIP-168 to propose adding a metric to emit total topic count
in a cluster. The metric will be emited by the controller.

The KIP can be found here
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-168%3A+Add+TotalTopicCount+metric+per+cluster)
and the assciated JIRA improvement is KAFKA-5461
(https://issues.apache.org/jira/browse/KAFKA-5461)

Appreciate all the comments.

Best,

Abhishek


[GitHub] kafka pull request #3153: MINOR: clarify partition option behavior for conso...

2017-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3153


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5461) KIP-168: Add TotalTopicCount metric per cluster

2017-06-16 Thread Abhishek Mendhekar (JIRA)
Abhishek Mendhekar created KAFKA-5461:
-

 Summary: KIP-168: Add TotalTopicCount metric per cluster
 Key: KAFKA-5461
 URL: https://issues.apache.org/jira/browse/KAFKA-5461
 Project: Kafka
  Issue Type: Improvement
Reporter: Abhishek Mendhekar
Assignee: Abhishek Mendhekar
 Fix For: 0.11.0.1


See KIP 
[here|https://cwiki.apache.org/confluence/display/KAFKA/KIP-168%3A+Add+TotalTopicCount+metric+per+cluster]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Improving tools: --help

2017-06-16 Thread Tom Bentley
That's a great start Mariam, thanks.

Out of interest, how do you handle required arguments for something like
kafka-topics.sh where what's required depends on the 'action' option?

Cheers,

Tom

On 16 June 2017 at 15:55, Mariam John  wrote:

> Hi Tom,
>
> I just wanted to let you know that I am currently working on KAFKA-2111:
> https://issues.apache.org/jira/browse/KAFKA-2111 and am almost done with
> it. The changes that are included in this patch are:
>
> - add --help option to all the commands where it is missing
> - correctly mark all the required options. I noticed that currently we
> rely on the user to add the REQUIRED wording in the option description. I
> have changed this to use the JOptSimple method required() to mark options
> as required. The other good thing about doing it that way is that when the
> parse method is called, it will automatically check if user has specified
> the required arguments (Right now we rely on CommandLineUtils.checkRequiredArg
> to do that)
> - Made some minor grammar changes to the option descriptions on some of
> the commands (like ending the description with ., Starting a description
> with caps, some wording changes)
> - Some commands did not have description of what it does - added those
>
>
> I am also looking at https://issues.apache.org/jira/browse/KAFKA-4220
> since it is related to this. So I am currently trying to include this
> changes in the current patchset for KAFKA-2111. It will include:
> - changes to properly catch OptionExceptions when converting the argument
> to different types and print out the error message and command usage
> information rather than throwing a stacktrace exception.
>
> There are close to 20 commands I have made changes to. I will try to push
> something out by end of day today. So I think I will cover 1) from your
> list below and not 2) and 3).
>
> Thanks Tom.
>
> Regards,
> Mariam.
>
>
>
> [image: Inactive hide details for Tom Bentley ---06/16/2017 08:06:11
> AM---Hi, I noticed that the command line tools could use a little]Tom
> Bentley ---06/16/2017 08:06:11 AM---Hi, I noticed that the command line
> tools could use a little love. For
>
> From: Tom Bentley 
> To: dev@kafka.apache.org
> Date: 06/16/2017 08:06 AM
> Subject: Improving tools: --help
> --
>
>
>
> Hi,
>
> I noticed that the command line tools could use a little love. For
> instance, I was surprised that most of them don't support `--help`, and
> generally there are a few inconsistencies.
>
> KIP-14 is dormant and AFAICS no one is working on
> https://issues.apache.org/jira/browse/KAFKA-2111 either. So if I'm not
> treading on anyone's toes, I would like to improve this situation.
> Specifically I propose to:
>
> 1. Add support for `--help` for all tools with shell scripts.
> 2. Include in the help output on non-Windows systems documentation for the
> script-implemented `--daemon`, `--name` and `--loggc` options, because
> currently they're not very discoverable.
> 3. Add more standard equivalents for non-UNIXish options (e.g. add
> `--consumer-config` as well as `--consumer.config`).
>
> I also noticed that overall Kafka has two option parsing dependencies:
> `argparse4j` and `jopt-simple`. Is there a good reason for this? If not
> then I could standardize on one at the same time.
>
> Would this require a KIP? If so should I open a new one, or resurrect
> KIP-14?
>
> Regards,
>
> Tom
>
>
>


Jenkins build is back to normal : kafka-trunk-jdk8 #1716

2017-06-16 Thread Apache Jenkins Server
See 




Re: Improving tools: --help

2017-06-16 Thread Mariam John

Hi Tom,

  I just wanted to let you know that I am currently working on KAFKA-2111:
https://issues.apache.org/jira/browse/KAFKA-2111 and am almost done with
it. The changes that are included in this patch are:

- add --help option to all the commands where it is missing
- correctly mark all the required options. I noticed that currently we rely
on the user to add the REQUIRED wording in the option description. I have
changed this to use the JOptSimple method required() to mark options as
required. The other good thing about doing it that way is that when the
parse method is called, it will automatically check if user has specified
the required arguments (Right now we rely on
CommandLineUtils.checkRequiredArg to do that)
- Made some minor grammar changes to the option descriptions on some of the
commands (like ending the description with ., Starting a description with
caps, some wording changes)
- Some commands did not have description of what it does - added those


I am also looking at https://issues.apache.org/jira/browse/KAFKA-4220 since
it is related to this. So  I am currently trying to include this changes in
the current patchset for KAFKA-2111. It will include:
- changes to properly catch OptionExceptions when converting the argument
to different types and print out the error message and command usage
information rather than throwing a stacktrace exception.

There are close to 20 commands I have made changes to. I will try to push
something out by end of day today. So I think I will cover 1) from your
list below and not 2) and 3).

Thanks Tom.

Regards,
Mariam.





From:   Tom Bentley 
To: dev@kafka.apache.org
Date:   06/16/2017 08:06 AM
Subject:Improving tools: --help



Hi,

I noticed that the command line tools could use a little love. For
instance, I was surprised that most of them don't support `--help`, and
generally there are a few inconsistencies.

KIP-14 is dormant and AFAICS no one is working on
https://issues.apache.org/jira/browse/KAFKA-2111 either. So if I'm not
treading on anyone's toes, I would like to improve this situation.
Specifically I propose to:

1. Add support for `--help` for all tools with shell scripts.
2. Include in the help output on non-Windows systems documentation for the
script-implemented `--daemon`, `--name` and `--loggc` options, because
currently they're not very discoverable.
3. Add more standard equivalents for non-UNIXish options (e.g. add
`--consumer-config` as well as `--consumer.config`).

I also noticed that overall Kafka has two option parsing dependencies:
`argparse4j` and `jopt-simple`. Is there a good reason for this? If not
then I could standardize on one at the same time.

Would this require a KIP? If so should I open a new one, or resurrect
KIP-14?

Regards,

Tom



Re?? [DISCUSS] KIP-148: Add a connect timeout for client

2017-06-16 Thread ????????
Hi Colin,
I think the exponential backoff should still apply, thanks for the 
explanation.
thanks,
David


--  --
??: "Colin McCabe";;
: 2017??6??13??(??) 1:43
??: "dev"; 

: Re: ??Re?? [DISCUSS] KIP-148: Add a  connect timeout for client



Just a note: KIP-144 added exponential backoff for broker reconnect
attempts, configured via reconnect.backoff.max.ms.

cheers,
Colin

On Sat, Jun 10, 2017, at 08:42,  wrote:
> --  --
> ??: "";<254479...@qq.com>;
> : 2017??6??4??(??) 6:05
> ??: "dev"; 
> 
> : Re?? [DISCUSS] KIP-148: Add a connect timeout for client
> 
> 
> 
> >I guess one obvious question is, how does this interact with retries? 
> >Does it result in a failure getting delivered to the end user more
> >quickly if connecting is impossible the first few times we try?  Does
> >exponential backoff still apply?
> 
> 
> Yes, for the retries it will make the end user more quickly to connect. 
> After the produce request 
> failed because of timeout,  network client close the connection and start
> to connect to the leastLoadedNode node.
> If the node has no response, we will quickly close the connecting in the
> specified timeout and try another node.
> 
> 
> And for the exponential backoff, do you mean for the TCP's exponential
> backoff or the NetworkClient's exponential backoff ?
> It seems the NetworkClient has no exponential backoff (the
> reconnect.backoff.ms parameter)
> 
> 
> Thanks
> David
> 
> 
> 
> 
> --  --
> ??: "Colin McCabe";;
> : 2017??5??31??(??) 2:44
> ??: "dev"; 
> 
> : Re: [DISCUSS] KIP-148: Add a connect timeout for client
> 
> 
> 
> On Mon, May 29, 2017, at 15:46, Guozhang Wang wrote:
> > On Wed, May 24, 2017 at 9:59 AM, Colin McCabe  wrote:
> > 
> > > On Tue, May 23, 2017, at 19:07, Guozhang Wang wrote:
> > > > I think using a single config to cover end-to-end latency with 
> > > > connecting
> > > > and request round-trip may not be best appropriate since 1) some request
> > > > may need much more time than others since they are parked (fetch request
> > > > with long polling, join group request etc) or throttled,
> > >
> > > Hmm.  My proposal was to implement _both_ end-to-end timeouts and
> > > per-call timeouts.  In that case, some requests needing much more time
> > > than others should not be a concern, since we can simply set a higher
> > > per-call timeout on the requests we think will need more time.
> > >
> > > > and 2) some
> > > > requests are prerequisite of others, like group request to discover the
> > > > coordinator before the fetch offset request, and implementation wise
> > > > these
> > > > request send/receive is embedded in latter ones, hence it is not clear 
> > > > if
> > > > the `request.timeout.ms` should cover just a single RPC or more.
> > >
> > > As far as I know, the request timeout has always covered a single RP  If
> > > we want to implement a higher level timeout that spans multiple RPCs, we
> > > can set the per-call timeouts appropriately.  For example:
> > >
> > > > long deadline = System.currentTimeMillis() + 6;
> > > > callA(callTimeout = deadline - System.currentTimeMillis())
> > > > callB(callTimeout = deadline - System.currentTimeMillis())
> > >
> > >
> > I may have misunderstand your previous email. Just clarifying:
> > 
> > 1) On the client we already have some configs for controlling end-to-end
> > timeout, e.g. "max.block.ms" on producer controls how long "send()" and
> > "partitionsFor()" will block for, and inside such API calls multiple
> > request round trips may be sent, and for the first request round trip, a
> > connecting phase may or may not be included. All of these are be covered
> > in
> > this "max.block.ms" timeout today. However, as we discussed before not
> > all
> > request round trips have similar latency expectation, so it is better to
> > make a per-request "request.timeout.ms" and the overall "max.block.ms"
> > would need to be at least the max of them.
> 
> That makes sense.
> 
> Just to be clear, when you say "per-request timeout" are you talking
> about a timeout that can be different for each request?  (This doesn't
> exist today, but has been proposed.)  Or are you talking about
> request.timeout.ms, the single timeout that currently applies to all
> requests in NetworkClient?
> 
> > 
> > 2) Now back to the question whether we should make "request.timeout.ms"
> > include potential connection phase as well: assume we are going to add
> > the
> > pre-request "request.timeout.ms" as suggested above, then we may still
> > have
> > a tight bound on how long connecting should take. For example, let's say
> > we
> > make "joingroup.request.timeout.ms" (or "fetch.request.timeout.ms" to be
> > large since we want really long polling behavior) to be a large value,
> > say
> > 200 seconds, then if the

[GitHub] kafka pull request #3355: KAFKA-5457: MemoryRecordsBuilder.hasRoomFor should...

2017-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3355


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-5457) MemoryRecordsBuilder.hasRoomfor doesn't take into account the headers while computing available space

2017-06-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-5457.

Resolution: Fixed

Issue resolved by pull request 3355
[https://github.com/apache/kafka/pull/3355]

> MemoryRecordsBuilder.hasRoomfor doesn't take into account the headers while 
> computing available space
> -
>
> Key: KAFKA-5457
> URL: https://issues.apache.org/jira/browse/KAFKA-5457
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
> Fix For: 0.11.0.0
>
>
> This could result in a BufferFull or similar exception when headers are 
> actually being used. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Improving tools: --help

2017-06-16 Thread Tom Bentley
Hi,

I noticed that the command line tools could use a little love. For
instance, I was surprised that most of them don't support `--help`, and
generally there are a few inconsistencies.

KIP-14 is dormant and AFAICS no one is working on
https://issues.apache.org/jira/browse/KAFKA-2111 either. So if I'm not
treading on anyone's toes, I would like to improve this situation.
Specifically I propose to:

1. Add support for `--help` for all tools with shell scripts.
2. Include in the help output on non-Windows systems documentation for the
script-implemented `--daemon`, `--name` and `--loggc` options, because
currently they're not very discoverable.
3. Add more standard equivalents for non-UNIXish options (e.g. add
`--consumer-config` as well as `--consumer.config`).

I also noticed that overall Kafka has two option parsing dependencies:
`argparse4j` and `jopt-simple`. Is there a good reason for this? If not
then I could standardize on one at the same time.

Would this require a KIP? If so should I open a new one, or resurrect
KIP-14?

Regards,

Tom


Build failed in Jenkins: kafka-trunk-jdk8 #1715

2017-06-16 Thread Apache Jenkins Server
See 


Changes:

[ismael] MINOR: Consolidate Utils.newThread, Utils.daemonThread and KafkaThread

--
[...truncated 4.77 MB...]

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartUnknownConnector STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartUnknownConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnectorRedirectToLeader STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnectorRedirectToLeader PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnectorRedirectToOwner STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnectorRedirectToOwner PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartUnknownTask STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartUnknownTask PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTaskRedirectToLeader STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTaskRedirectToLeader PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTaskRedirectToOwner STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTaskRedirectToOwner PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorConfigAdded STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorConfigAdded PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorConfigUpdate STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorConfigUpdate PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorPaused STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorPaused PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorResumed STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorResumed PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testUnknownConnectorPaused STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testUnknownConnectorPaused PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorPausedRunningTaskOnly STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorPausedRunningTaskOnly PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorResumedRunningTaskOnly STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorResumedRunningTaskOnly PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testTaskConfigAdded STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testTaskConfigAdded PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testJoinLeaderCatchUpFails STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testJoinLeaderCatchUpFails PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testInconsistentConfigs STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testInconsistentConfigs PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testDestroyConnector STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testDestroyConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTask STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTask PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedCustomValidation STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedCustomValidation PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedBasicValidation STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedBasicValidation PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnector STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testAccessors STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testAccessors PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorA

[jira] [Created] (KAFKA-5460) Documentation on website uses word-breaks resulting in confusion

2017-06-16 Thread Karel Vervaeke (JIRA)
Karel Vervaeke created KAFKA-5460:
-

 Summary: Documentation on website uses word-breaks resulting in 
confusion
 Key: KAFKA-5460
 URL: https://issues.apache.org/jira/browse/KAFKA-5460
 Project: Kafka
  Issue Type: Bug
Reporter: Karel Vervaeke
 Attachments: Screen Shot 2017-06-16 at 14.45.40.png

Documentation seems to suggest there is a configuration property 
auto.off-set.reset but it really is auto.offset.reset.

We should look into disabling the word-break css properties (globally or at 
least in the configuration reference tables)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3125: KAFKA-5311: Support ExtendedDeserializer in Kafka ...

2017-06-16 Thread subnova
Github user subnova closed the pull request at:

https://github.com/apache/kafka/pull/3125


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3351: MINOR: Add --help to DumpLogSegments

2017-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3351


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3350: MINOR: Redirected the creation of new Thread to Ka...

2017-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3350


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5459) Support kafka-console-producer.sh messages as whole file

2017-06-16 Thread Tom Bentley (JIRA)
Tom Bentley created KAFKA-5459:
--

 Summary: Support kafka-console-producer.sh messages as whole file
 Key: KAFKA-5459
 URL: https://issues.apache.org/jira/browse/KAFKA-5459
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Affects Versions: 0.10.2.1
Reporter: Tom Bentley
Priority: Trivial


{{kafka-console-producer.sh}} treats each line read as a separate message. This 
can be controlled using the {{--line-reader}} option and the corresponding 
{{MessageReader}} trait. It would be useful to have built-in support for 
sending the whole input stream/file as the message. 





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Kafka Streams vs Spark Streaming : reduce by window

2017-06-16 Thread Michal Borowiecki
I wonder if it's a frequent enough use case that Kafka Streams should 
consider providing this out of the box - this was asked for multiple 
times, right?


Personally, I agree totally with the philosophy of "no final 
aggregation", as expressed by Eno's post, but IMO that is predicated 
totally on event-time semantics.


If users want processing-time semantics then, as the docs already point 
out, there is no such thing as a late-arriving record - every record 
just falls in the currently open window(s), hence the notion of final 
aggregation makes perfect sense, from the usability point of view.


The single abstraction of "stream time" proves leaky in some cases (e.g. 
for punctuate method - being addressed in KIP-138). Perhaps this is 
another case where processing-time semantics warrant explicit handling 
in the api - but of course, only if there's sufficient user demand for this.


What I could imagine is a new type of time window 
(ProcessingTimeWindow?), that if used in an aggregation, the underlying 
processor would force the WallclockTimestampExtractor (KAFKA-4144 
enables that) and would use the system-time punctuation (KIP-138) to 
send the final aggregation value once the window has expired and could 
be configured to not send intermediate updates while the window was open.


Of course this is just a helper for the users, since they can implement 
it all themselves using the low-level API, as Matthias pointed out 
already. Just seems there's recurring interest in this.


Again, this only makes sense for processing time semantics. For 
event-time semantics I find the arguments for "no final aggregation" 
totally convincing.



Cheers,

Michał


On 16/06/17 00:08, Matthias J. Sax wrote:

Hi Paolo,

This SO question might help, too:
https://stackoverflow.com/questions/38935904/how-to-send-final-kafka-streams-aggregation-result-of-a-time-windowed-ktable

For Streams, the basic model is based on "change" and we report updates
to the "current" result immediately reducing latency to a minimum.

Last, if you say it's going to fall into the next window, you won't get
event time semantics but you fall back processing time semantics, that
cannot provide exact results

If you really want to trade-off correctness version getting (late)
updates and want to use processing time semantics, you should configure
WallclockTimestampExtractor and implement a "update deduplication"
operator using table.toStream().transform(). You can attached a state to
your transformer and store all update there (ie, newer update overwrite
older updates). Punctuations allow you to emit "final" results for
windows for which "window end time" passed.


-Matthias

On 6/15/17 9:21 AM, Paolo Patierno wrote:

Hi Eno,


regarding closing window I think that it's up to the streaming application. I 
mean ...

If I want something like I described, I know that a value outside my 5 seconds window 
will be taken into account for the next processing (in the next 5 seconds). I don't think 
I'm losing a record, I am ware that this record will fall in the next 
"processing" window. Btw I'll take a look at your article ! Thanks !


Paolo


Paolo Patierno
Senior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoT
Microsoft Azure Advisor

Twitter : @ppatierno
Linkedin : paolopatierno
Blog : DevExperience



From: Eno Thereska 
Sent: Thursday, June 15, 2017 3:57 PM
To: us...@kafka.apache.org
Subject: Re: Kafka Streams vs Spark Streaming : reduce by window

Hi Paolo,

Yeah, so if you want fewer records, you should actually "not" disable cache. If 
you disable cache you'll get all the records as you described.

About closing windows: if you close a window and a late record arrives that should 
have been in that window, you basically lose the ability to process that record. In 
Kafka Streams we are robust to that, in that we handle late arriving records. There 
is a comparison here for example when we compare it to other methods that depend on 
watermarks or triggers: 
https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/ 


Eno



On 15 Jun 2017, at 14:57, Paolo Patierno  wrote:

Hi Emo,


thanks for the reply !

Regarding the cache I'm already using CACHE_MAX_BYTES_BUFFERING_CONFIG = 0 (so 
disabling cache).

Regarding the interactive query API (I'll take a look) it means that it's up to 
the application doing something like we have oob with Spark.

May I ask what do you mean with "We don’t believe in closing windows" ? Isn't 
it much more code that user has to write for having the same result ?

I'm exploring Kafka Streams and it's very powerful imho even because the usage 
is pretty simple but this scenario could have a lack against Spark.


Thanks,

Paolo.


Paolo Patierno
Senior Software Eng

[jira] [Resolved] (KAFKA-5426) One second delay in kafka-console-producer

2017-06-16 Thread Paolo Patierno (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Patierno resolved KAFKA-5426.
---
Resolution: Not A Bug

> One second delay in kafka-console-producer
> --
>
> Key: KAFKA-5426
> URL: https://issues.apache.org/jira/browse/KAFKA-5426
> Project: Kafka
>  Issue Type: Bug
>  Components: config, producer 
>Affects Versions: 0.10.2.1
> Environment: Ubuntu 16.04 with OpenJDK-8 and Ubuntu 14.04 with 
> OpenJDK-7
>Reporter: Francisco Robles Martin
>Assignee: Paolo Patierno
>Priority: Minor
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Hello!
> I have been trying to change the default delay for the original 
> kafka-console-producer with both adding the producer.properties with a 
> different configuration  for linger.ms and batch.size, and also providing it 
> directly in the command line with "--property" but nothing works. 
> I have also tried it in a VM with Ubuntu 14.04 and using 0.8.2.1 Kafka 
> version but I have had the same result. I don't know if it has been designed 
> like that to don't be able to change the behaviour of the console-producer or 
> if this is a bug.
> Here you can see my original post in StackOverFlow asking for help in this 
> issue: 
> https://stackoverflow.com/questions/44334304/kafka-spark-streaming-constant-delay-of-1-second



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)