Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-03-16 Thread Magnus Edenhill
1. But there is no way of conveying version mismatch in the protocol, that's
one of the reasons for this KIP in the first place :)
Even if there was (e.g., the empty response hack) it makes the client
implementation
more complex since the cached cluster-level version support returned by the
broker
can't really be trusted so then there is an initial "We can probably use
this version"
state followed by a "phew, request succeeded" or "darn, seems the request
wasn't supported" state.

Additionally these late errors are going to be a problem in some cases,
let's say
a certain client really needs OffsetCommit v9, so it starts a consumer,
processes
a bunch of messages and then attempts to commit the latest processed offset
only to find
broker didnt support v9 thus failing the commit. We then have a situation
where
messages were processed but they can't be committed, thus resulting in
multiple
processing, or likewise.

Re caching:
A couple of years back we had the discussion on cached cluster ISR info in
the
Metadata response, since the information was cached it couldn't be trusted
and was thus
deemed useless. Let's not make that mistake again.

Re in Metadata or not:
I'm personally fine with having a specific request to query the current
broker's API support,
the client will fire off both MetadataRequests + ApiQueryRequest back to
back so there is
no latency penalty. This is a cleaner solution and I'm a weak +1 for it
unless other client devs oppose.


/Magnus


2016-03-16 0:37 GMT+01:00 Jay Kreps :

> Yeah I think there are two possible approaches:
> 1. You get the versions in the metadata request and cache them and
> invalidate that cache if you get a version mismatch error (basically
> as we do with leadership information).
> 2. You check each connection
>
> I think combining metadata request and version check only makes sense
> in (1), right? If it is (2) I don't see how you save anything and the
> requests don't really make sense because you're mixing cluster wide
> state about partitions with info about the answering broker.
>
> -Jay
>
> On Tue, Mar 15, 2016 at 4:25 PM, Magnus Edenhill 
> wrote:
> > Hey Jay,
> >
> > as discussed earlier it is not safe to cache/relay a broker's version or
> > its supported API versions,
> > by the time the client connects the broker might have upgraded to another
> > version which effectively
> > makes this information useless in a cached form.
> >
> > The complexity of querying for protocol verion is very implementation
> > dependent and
> > hard to generalize on, I dont foresee any bigger problems adding support
> > for an extra protocol version
> > querying state in librdkafka, but other client devs should chime in.
> > There are already post-connect,pre-operation states for dealing with SSL
> > and SASL.
> >
> > The reason for putting the API versioning stuff in the Metadata request
> is
> > that it is already used
> > for bootstrapping a client and/or connection and thus saves us a
> round-trip
> > (and possibly a state).
> >
> >
> > For how this will be used; I can't speak for other client devs but aim to
> > make a mapping between
> > the features my client exposes to a set of specific APIs and their
> minimum
> > version..
> > E.g.: Balanced consumer groups requires JoinGroup >= V0, LeaveGroup >=
> V0,
> > SyncGroup >= V0, and so on.
> > If those requirements can be fullfilled then the feature is enabled,
> > otherwise an error is returned to the user.
> >
> > /Magnus
> >
> >
> > 2016-03-15 23:35 GMT+01:00 Jay Kreps :
> >
> >> Hey Ashish,
> >>
> >> Can you expand in the proposal on how this would be used by clients?
> >> This proposal only has one slot for api versions, though in fact there
> >> is potentially a different version on each broker. I think the
> >> proposal is that every single time the client establishes a connection
> >> it would then need to issue a metadata request on that connection to
> >> check supported versions. Is that correct?
> >>
> >> The point of merging version information with metadata request was
> >> that the client wouldn't have to manage this additional state for each
> >> connection, but rather the broker would gather the information and
> >> give a summary of all brokers in the cluster. (Managing the state
> >> doesn't seem complex but actually since the full state machine for a
> >> request is something like begin connecting=>connection complete=>begin
> >> sending request=>do work sending=>await response=>do work reading
> >> response adding to the state machine around this is not as simple as
> >> it seems...you can see the code in the java client around this).
> >>
> >> It sounds like in this proposal you are proposing merging with the
> >> metadata request but not summarizing across the cluster? Can you
> >> explain the thinking vs a separate request?
> >>
> >> It would really be good if the KIP can summarize the whole interaction
> >> and how clients will work.
> >>
> >> -Jay
> >>
> >> On Tue, Mar 15, 2016 at 3:24 PM, Ashish 

[jira] [Closed] (KAFKA-3401) Message format change on the fly breaks 0.9 consumer

2016-03-16 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska closed KAFKA-3401.
---

> Message format change on the fly breaks 0.9 consumer
> 
>
> Key: KAFKA-3401
> URL: https://issues.apache.org/jira/browse/KAFKA-3401
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Eno Thereska
>Assignee: Jiangjie Qin
>Priority: Blocker
> Attachments: 2016-03-15--009.zip
>
>
> The new system test as part of KAFKA-3202 reveals a problem when the message 
> format is changed on the fly. When the cluster is using 0.10.x brokers and 
> producers and consumers use version 0.9.0.1 an error happens when the message 
> format is changed on the fly to version 0.9:
> {code}
> Exception: {'ConsoleConsumer-worker-1': Exception('Unexpected message format 
> (expected an integer). Message: null',)}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3411) Streams: rename job.id to application.id

2016-03-16 Thread Michael Noll (JIRA)
Michael Noll created KAFKA-3411:
---

 Summary: Streams: rename job.id to application.id
 Key: KAFKA-3411
 URL: https://issues.apache.org/jira/browse/KAFKA-3411
 Project: Kafka
  Issue Type: Improvement
  Components: kafka streams
Affects Versions: 0.10.0.0
Reporter: Michael Noll
Priority: Minor


Background: We stopped using the terminology of a "job" in the context of Kafka 
Streams.  For example, the upcoming Streams docs do not refer to a "job" 
anymore; otherwise it's very confusing to readers that are familiar with "jobs" 
in Hadoop, Spark, Storm, etc. because there's no equivalent concept of a "job" 
in Streams.

We should update the Streams code (see "cd streams/ ; git grep job" for a 
starting point) to reflect this accordingly.  Notably, the configuration option 
"job.id" should be changed to "application.id".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3411) Streams: stop using "job" terminology, rename job.id to application.id

2016-03-16 Thread Michael Noll (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Noll updated KAFKA-3411:

Summary: Streams: stop using "job" terminology, rename job.id to 
application.id  (was: Streams: rename job.id to application.id)

> Streams: stop using "job" terminology, rename job.id to application.id
> --
>
> Key: KAFKA-3411
> URL: https://issues.apache.org/jira/browse/KAFKA-3411
> Project: Kafka
>  Issue Type: Improvement
>  Components: kafka streams
>Affects Versions: 0.10.0.0
>Reporter: Michael Noll
>Priority: Minor
>
> Background: We stopped using the terminology of a "job" in the context of 
> Kafka Streams.  For example, the upcoming Streams docs do not refer to a 
> "job" anymore; otherwise it's very confusing to readers that are familiar 
> with "jobs" in Hadoop, Spark, Storm, etc. because there's no equivalent 
> concept of a "job" in Streams.
> We should update the Streams code (see "cd streams/ ; git grep job" for a 
> starting point) to reflect this accordingly.  Notably, the configuration 
> option "job.id" should be changed to "application.id".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3411: Streams: stop using "job" terminol...

2016-03-16 Thread miguno
GitHub user miguno opened a pull request:

https://github.com/apache/kafka/pull/1081

KAFKA-3411: Streams: stop using "job" terminology, rename job.id to 
application.id

@guozhangwang @ymatsuda : please review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/miguno/kafka KAFKA-3411

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1081.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1081


commit 6bb7be2dc02fc8c3c6afabce05c1e8bac2426449
Author: Michael G. Noll 
Date:   2016-03-16T11:15:08Z

KAFKA-3411: Streams: stop using "job" terminology, rename job.id to 
application.id




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3411) Streams: stop using "job" terminology, rename job.id to application.id

2016-03-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197199#comment-15197199
 ] 

ASF GitHub Bot commented on KAFKA-3411:
---

GitHub user miguno opened a pull request:

https://github.com/apache/kafka/pull/1081

KAFKA-3411: Streams: stop using "job" terminology, rename job.id to 
application.id

@guozhangwang @ymatsuda : please review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/miguno/kafka KAFKA-3411

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1081.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1081


commit 6bb7be2dc02fc8c3c6afabce05c1e8bac2426449
Author: Michael G. Noll 
Date:   2016-03-16T11:15:08Z

KAFKA-3411: Streams: stop using "job" terminology, rename job.id to 
application.id




> Streams: stop using "job" terminology, rename job.id to application.id
> --
>
> Key: KAFKA-3411
> URL: https://issues.apache.org/jira/browse/KAFKA-3411
> Project: Kafka
>  Issue Type: Improvement
>  Components: kafka streams
>Affects Versions: 0.10.0.0
>Reporter: Michael Noll
>Priority: Minor
>
> Background: We stopped using the terminology of a "job" in the context of 
> Kafka Streams.  For example, the upcoming Streams docs do not refer to a 
> "job" anymore; otherwise it's very confusing to readers that are familiar 
> with "jobs" in Hadoop, Spark, Storm, etc. because there's no equivalent 
> concept of a "job" in Streams.
> We should update the Streams code (see "cd streams/ ; git grep job" for a 
> starting point) to reflect this accordingly.  Notably, the configuration 
> option "job.id" should be changed to "application.id".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)