Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-25 Thread Roger Hoover
Other than supporting multiplexing transactional streams on a single
producer, I don't see how to improve it.

On Thu, Aug 24, 2023 at 12:12 PM Artem Livshits
 wrote:

> Hi Roger,
>
> Thank you for summarizing the cons.  I agree and I'm curious what would be
> the alternatives to solve these problems better and if they can be
> incorporated into this proposal (or built independently in addition to or
> on top of this proposal).  E.g. one potential extension we discussed
> earlier in the thread could be multiplexing logical transactional "streams"
> with a single producer.
>
> -Artem
>
> On Wed, Aug 23, 2023 at 4:50 PM Roger Hoover 
> wrote:
>
> > Thanks.  I like that you're moving Kafka toward supporting this
> dual-write
> > pattern.  Each use case needs to consider the tradeoffs.  You already
> > summarized the pros very well in the KIP.  I would summarize the cons
> > as follows:
> >
> > - you sacrifice availability - each write requires both DB and Kafka to
> be
> > available so I think your overall application availability is 1 - p(DB is
> > unavailable)*p(Kafka is unavailable).
> > - latency will be higher and throughput lower - each write requires both
> > writes to DB and Kafka while holding an exclusive lock in DB.
> > - you need to create a producer per unit of concurrency in your app which
> > has some overhead in the app and Kafka side (number of connections, poor
> > batching).  I assume the producers would need to be configured for low
> > latency (linger.ms=0)
> > - there's some complexity in managing stable transactional ids for each
> > producer/concurrency unit in your application.  With k8s deployment, you
> > may need to switch to something like a StatefulSet that gives each pod a
> > stable identity across restarts.  On top of that pod identity which you
> can
> > use as a prefix, you then assign unique transactional ids to each
> > concurrency unit (thread/goroutine).
> >
> > On Wed, Aug 23, 2023 at 12:53 PM Artem Livshits
> >  wrote:
> >
> > > Hi Roger,
> > >
> > > Thank you for the feedback.  You make a very good point that we also
> > > discussed internally.  Adding support for multiple concurrent
> > > transactions in one producer could be valuable but it seems to be a
> > fairly
> > > large and independent change that would deserve a separate KIP.  If
> such
> > > support is added we could modify 2PC functionality to incorporate that.
> > >
> > > > Maybe not too bad but a bit of pain to manage these ids inside each
> > > process and across all application processes.
> > >
> > > I'm not sure if supporting multiple transactions in one producer would
> > make
> > > id management simpler: we'd need to store a piece of data per
> > transaction,
> > > so whether it's N producers with a single transaction or N transactions
> > > with a single producer, it's still roughly the same amount of data to
> > > manage.  In fact, managing transactional ids (current proposal) might
> be
> > > easier, because the id is controlled by the application and it knows
> how
> > to
> > > complete the transaction after crash / restart; while a TID would be
> > > generated by Kafka and that would create a question of starting Kafka
> > > transaction, but not saving its TID and then crashing, then figuring
> out
> > > which transactions to abort and etc.
> > >
> > > > 2) creating a separate producer for each concurrency slot in the
> > > application
> > >
> > > This is a very valid concern.  Maybe we'd need to have some
> multiplexing
> > of
> > > transactional logical "streams" over the same connection.  Seems like a
> > > separate KIP, though.
> > >
> > > > Otherwise, it seems you're left with single-threaded model per
> > > application process?
> > >
> > > That's a fair assessment.  Not necessarily exactly single-threaded per
> > > application, but a single producer per thread model (i.e. an
> application
> > > could have a pool of threads + producers to increase concurrency).
> > >
> > > -Artem
> > >
> > > On Tue, Aug 22, 2023 at 7:22 PM Roger Hoover 
> > > wrote:
> > >
> > > > Artem,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > If I understand correctly, Kafka does not support concurrent
> > transactions
> > > > from the same producer (transactional id).  I think this means that
&g

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-23 Thread Roger Hoover
Thanks.  I like that you're moving Kafka toward supporting this dual-write
pattern.  Each use case needs to consider the tradeoffs.  You already
summarized the pros very well in the KIP.  I would summarize the cons
as follows:

- you sacrifice availability - each write requires both DB and Kafka to be
available so I think your overall application availability is 1 - p(DB is
unavailable)*p(Kafka is unavailable).
- latency will be higher and throughput lower - each write requires both
writes to DB and Kafka while holding an exclusive lock in DB.
- you need to create a producer per unit of concurrency in your app which
has some overhead in the app and Kafka side (number of connections, poor
batching).  I assume the producers would need to be configured for low
latency (linger.ms=0)
- there's some complexity in managing stable transactional ids for each
producer/concurrency unit in your application.  With k8s deployment, you
may need to switch to something like a StatefulSet that gives each pod a
stable identity across restarts.  On top of that pod identity which you can
use as a prefix, you then assign unique transactional ids to each
concurrency unit (thread/goroutine).

On Wed, Aug 23, 2023 at 12:53 PM Artem Livshits
 wrote:

> Hi Roger,
>
> Thank you for the feedback.  You make a very good point that we also
> discussed internally.  Adding support for multiple concurrent
> transactions in one producer could be valuable but it seems to be a fairly
> large and independent change that would deserve a separate KIP.  If such
> support is added we could modify 2PC functionality to incorporate that.
>
> > Maybe not too bad but a bit of pain to manage these ids inside each
> process and across all application processes.
>
> I'm not sure if supporting multiple transactions in one producer would make
> id management simpler: we'd need to store a piece of data per transaction,
> so whether it's N producers with a single transaction or N transactions
> with a single producer, it's still roughly the same amount of data to
> manage.  In fact, managing transactional ids (current proposal) might be
> easier, because the id is controlled by the application and it knows how to
> complete the transaction after crash / restart; while a TID would be
> generated by Kafka and that would create a question of starting Kafka
> transaction, but not saving its TID and then crashing, then figuring out
> which transactions to abort and etc.
>
> > 2) creating a separate producer for each concurrency slot in the
> application
>
> This is a very valid concern.  Maybe we'd need to have some multiplexing of
> transactional logical "streams" over the same connection.  Seems like a
> separate KIP, though.
>
> > Otherwise, it seems you're left with single-threaded model per
> application process?
>
> That's a fair assessment.  Not necessarily exactly single-threaded per
> application, but a single producer per thread model (i.e. an application
> could have a pool of threads + producers to increase concurrency).
>
> -Artem
>
> On Tue, Aug 22, 2023 at 7:22 PM Roger Hoover 
> wrote:
>
> > Artem,
> >
> > Thanks for the reply.
> >
> > If I understand correctly, Kafka does not support concurrent transactions
> > from the same producer (transactional id).  I think this means that
> > applications that want to support in-process concurrency (say
> thread-level
> > concurrency with row-level DB locking) would need to manage separate
> > transactional ids and producers per thread and then store txn state
> > accordingly.   The potential usability downsides I see are
> > 1) managing a set of transactional ids for each application process that
> > scales up to it's max concurrency.  Maybe not too bad but a bit of pain
> to
> > manage these ids inside each process and across all application
> processes.
> > 2) creating a separate producer for each concurrency slot in the
> > application - this could create a lot more producers and resultant
> > connections to Kafka than the typical model of a single producer per
> > process.
> >
> > Otherwise, it seems you're left with single-threaded model per
> application
> > process?
> >
> > Thanks,
> >
> > Roger
> >
> > On Tue, Aug 22, 2023 at 5:11 PM Artem Livshits
> >  wrote:
> >
> > > Hi Roger, Arjun,
> > >
> > > Thank you for the questions.
> > > > It looks like the application must have stable transactional ids over
> > > time?
> > >
> > > The transactional id should uniquely identify a producer instance and
> > needs
> > > to be stable across the restarts.  If the transactional id is not
> stable
> > > a

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-22 Thread Roger Hoover
Artem,

Thanks for the reply.

If I understand correctly, Kafka does not support concurrent transactions
from the same producer (transactional id).  I think this means that
applications that want to support in-process concurrency (say thread-level
concurrency with row-level DB locking) would need to manage separate
transactional ids and producers per thread and then store txn state
accordingly.   The potential usability downsides I see are
1) managing a set of transactional ids for each application process that
scales up to it's max concurrency.  Maybe not too bad but a bit of pain to
manage these ids inside each process and across all application processes.
2) creating a separate producer for each concurrency slot in the
application - this could create a lot more producers and resultant
connections to Kafka than the typical model of a single producer per
process.

Otherwise, it seems you're left with single-threaded model per application
process?

Thanks,

Roger

On Tue, Aug 22, 2023 at 5:11 PM Artem Livshits
 wrote:

> Hi Roger, Arjun,
>
> Thank you for the questions.
> > It looks like the application must have stable transactional ids over
> time?
>
> The transactional id should uniquely identify a producer instance and needs
> to be stable across the restarts.  If the transactional id is not stable
> across restarts, then zombie messages from a previous incarnation of the
> producer may violate atomicity.  If there are 2 producer instances
> concurrently producing data with the same transactional id, they are going
> to constantly fence each other and most likely make little or no progress.
>
> The name might be a little bit confusing as it may be mistaken for a
> transaction id / TID that uniquely identifies every transaction.  The name
> and the semantics were defined in the original exactly-once-semantics (EoS)
> proposal (
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging
> )
> and KIP-939 just build on top of that.
>
> > I'm curious to understand what happens if the producer dies, and does not
> come up and recover the pending transaction within the transaction timeout
> interval.
>
> If the producer / application never comes back, the transaction will remain
> in prepared (a.k.a. "in-doubt") state until an operator forcefully
> terminates the transaction.  That's why there is a new ACL is defined in
> this proposal -- this functionality should only provided to applications
> that implement proper recovery logic.
>
> -Artem
>
> On Tue, Aug 22, 2023 at 12:52 AM Arjun Satish 
> wrote:
>
> > Hello Artem,
> >
> > Thanks for the KIP.
> >
> > I have the same question as Roger on concurrent writes, and an additional
> > one on consumer behavior. Typically, transactions will timeout if not
> > committed within some time interval. With the proposed changes in this
> KIP,
> > consumers cannot consume past the ongoing transaction. I'm curious to
> > understand what happens if the producer dies, and does not come up and
> > recover the pending transaction within the transaction timeout interval.
> Or
> > are we saying that when used in this 2PC context, we should configure
> these
> > transaction timeouts to very large durations?
> >
> > Thanks in advance!
> >
> > Best,
> > Arjun
> >
> >
> > On Mon, Aug 21, 2023 at 1:06 PM Roger Hoover 
> > wrote:
> >
> > > Hi Artem,
> > >
> > > Thanks for writing this KIP.  Can you clarify the requirements a bit
> more
> > > for managing transaction state?  It looks like the application must
> have
> > > stable transactional ids over time?   What is the granularity of those
> > ids
> > > and producers?  Say the application is a multi-threaded Java web
> server,
> > > can/should all the concurrent threads share a transactional id and
> > > producer?  That doesn't seem right to me unless the application is
> using
> > > global DB locks that serialize all requests.  Instead, if the
> application
> > > uses row-level DB locks, there could be multiple, concurrent,
> independent
> > > txns happening in the same JVM so it seems like the granularity
> managing
> > > transactional ids and txn state needs to line up with granularity of
> the
> > DB
> > > locking.
> > >
> > > Does that make sense or am I misunderstanding?
> > >
> > > Thanks,
> > >
> > > Roger
> > >
> > > On Wed, Aug 16, 2023 at 11:40 PM Artem Livshits
> > >  wrote:
> > >
> > > > Hello,
> > > >
> > > > This is a disc

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2023-08-21 Thread Roger Hoover
Hi Artem,

Thanks for writing this KIP.  Can you clarify the requirements a bit more
for managing transaction state?  It looks like the application must have
stable transactional ids over time?   What is the granularity of those ids
and producers?  Say the application is a multi-threaded Java web server,
can/should all the concurrent threads share a transactional id and
producer?  That doesn't seem right to me unless the application is using
global DB locks that serialize all requests.  Instead, if the application
uses row-level DB locks, there could be multiple, concurrent, independent
txns happening in the same JVM so it seems like the granularity managing
transactional ids and txn state needs to line up with granularity of the DB
locking.

Does that make sense or am I misunderstanding?

Thanks,

Roger

On Wed, Aug 16, 2023 at 11:40 PM Artem Livshits
 wrote:

> Hello,
>
> This is a discussion thread for
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC
> .
>
> The KIP proposes extending Kafka transaction support (that already uses 2PC
> under the hood) to enable atomicity of dual writes to Kafka and an external
> database, and helps to fix a long standing Flink issue.
>
> An example of code that uses the dual write recipe with JDBC and should
> work for most SQL databases is here
> https://github.com/apache/kafka/pull/14231.
>
> The FLIP for the sister fix in Flink is here
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255071710
>
> -Artem
>


Re: [VOTE] KIP-602: Change default value for client.dns.lookup

2020-05-22 Thread Roger Hoover
+1 (non-binding)  Thanks, Badai.

On Fri, May 22, 2020 at 10:05 AM Ismael Juma  wrote:

> Thanks for the KIP, +1 (binding).
>
> Ismael
>
> On Fri, May 22, 2020 at 1:40 AM Badai Aqrandista 
> wrote:
>
> > Hi All,
> >
> > I would like to start the vote on KIP-602: Change default value for
> > client.dns.lookup
> >
> > For reference, here is the KIP wiki:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-602%3A+Change+default+value+for+client.dns.lookup
> >
> > And discussion thread:
> >
> >
> >
> https://lists.apache.org/thread.html/r0e70d3757267c4158f12c05a4e5ac9eb33f2d11ce99d5878b3b4b3f7%40%3Cdev.kafka.apache.org%3E
> >
> > --
> > Thanks,
> > Badai
> >
>


Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-13 Thread Roger Hoover
Sorry, one more thing occurred to me.  Can the names of the time-based
metrics include their units?  That makes it much easier for people
consuming the metrics to interpret them correctly.

For example, offset-commit-max-time would become offset-commit-max-time-ms
or offset-commit-max-time-microsecs (-us?) or whatever you plan to make the
unit be.

On Tue, Sep 12, 2017 at 6:19 PM, Sriram Subramanian <r...@confluent.io>
wrote:

> FWIW, I agree that time metrics have been very useful in the past. The
> reasoning around perf overhead seems reasonable as well. Can we agree on a
> subset of time metrics that we feel would be super useful for debugging?
>
> On Tue, Sep 12, 2017 at 6:08 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Thanks, Ewen.
> >
> > I agree with you on the overhead of measuring time for SMTs and
> > converters.  I'd still argue for keeping other metrics like flush time
> b/c
> > even small batches should still be small overhead compared to writing to
> a
> > sink.
> >
> > On Tue, Sep 12, 2017 at 3:06 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> > > Requests are generally substantial batches of data, you are not
> > guaranteed
> > > that for the processing batches both because source connectors can hand
> > you
> > > batches of whatever size they want and consumer's max.poll.records can
> be
> > > overridden.
> > >
> > > Both SMTs and converters are a concern because they can both be
> > relatively
> > > cheap such that just checking the time in between them could possibly
> > dwarf
> > > the cost of applying them.
> > >
> > > Also, another thought re: rebalance metrics: we are already getting
> some
> > > info via AbstractCoordinator and those actually provide a bit more
> detail
> > > in some ways (e.g. join & sync vs the entire rebalance). Not sure if we
> > > want to effectively duplicate some info so it can all be located under
> > > Connect names or rely on the existing metrics for some of these.
> > >
> > > -Ewen
> > >
> > > On Tue, Sep 12, 2017 at 2:05 PM, Roger Hoover <roger.hoo...@gmail.com>
> > > wrote:
> > >
> > > > Ewen,
> > > >
> > > > I don't know the details of the perf concern.  How is it that the
> Kafka
> > > > broker can keep latency stats per request without suffering too much
> > > > performance?  Maybe SMTs are the only concern b/c they are
> per-message.
> > > If
> > > > so, let's remove those and keep timing info for everything else like
> > > > flushes, which are batch-based.
> > > >
> > > >
> > > > On Tue, Sep 12, 2017 at 1:32 PM, Ewen Cheslack-Postava <
> > > e...@confluent.io>
> > > > wrote:
> > > >
> > > > > On Tue, Sep 12, 2017 at 10:55 AM, Gwen Shapira <g...@confluent.io>
> > > > wrote:
> > > > >
> > > > > > Ewen, you gave a nice talk at Kafka Summit where you warned about
> > the
> > > > > > danger of SMTs that slow down the data pipe. If we don't provide
> > the
> > > > time
> > > > > > metrics, how will users know when their SMTs are causing
> > performance
> > > > > > issues?
> > > > > >
> > > > >
> > > > > Metrics aren't the only way to gain insight about performance and
> > > always
> > > > > measuring this even when it's not necessarily being used may not
> make
> > > > > sense. SMT authors are much better off starting out with a JMH or
> > > similar
> > > > > benchmark. What I was referring to in the talk is more about
> > > > understanding
> > > > > that the processing for SMTs is entirely synchronous and that means
> > > > certain
> > > > > classes of operations will just generally be a bad idea, e.g.
> > anything
> > > > that
> > > > > goes out over the network to another service. You don't even really
> > > need
> > > > > performance info to determine that that type of transformation will
> > > cause
> > > > > problems.
> > > > >
> > > > > But my point wasn't that timing info isn't useful. It's that we
> know
> > > that
> > > > > getting timestamps is pretty expensive and we'll already be doing
> so
> > > > > elsewhere (e.g. if a source record doe

Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-12 Thread Roger Hoover
Thanks, Ewen.

I agree with you on the overhead of measuring time for SMTs and
converters.  I'd still argue for keeping other metrics like flush time b/c
even small batches should still be small overhead compared to writing to a
sink.

On Tue, Sep 12, 2017 at 3:06 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Requests are generally substantial batches of data, you are not guaranteed
> that for the processing batches both because source connectors can hand you
> batches of whatever size they want and consumer's max.poll.records can be
> overridden.
>
> Both SMTs and converters are a concern because they can both be relatively
> cheap such that just checking the time in between them could possibly dwarf
> the cost of applying them.
>
> Also, another thought re: rebalance metrics: we are already getting some
> info via AbstractCoordinator and those actually provide a bit more detail
> in some ways (e.g. join & sync vs the entire rebalance). Not sure if we
> want to effectively duplicate some info so it can all be located under
> Connect names or rely on the existing metrics for some of these.
>
> -Ewen
>
> On Tue, Sep 12, 2017 at 2:05 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Ewen,
> >
> > I don't know the details of the perf concern.  How is it that the Kafka
> > broker can keep latency stats per request without suffering too much
> > performance?  Maybe SMTs are the only concern b/c they are per-message.
> If
> > so, let's remove those and keep timing info for everything else like
> > flushes, which are batch-based.
> >
> >
> > On Tue, Sep 12, 2017 at 1:32 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> > > On Tue, Sep 12, 2017 at 10:55 AM, Gwen Shapira <g...@confluent.io>
> > wrote:
> > >
> > > > Ewen, you gave a nice talk at Kafka Summit where you warned about the
> > > > danger of SMTs that slow down the data pipe. If we don't provide the
> > time
> > > > metrics, how will users know when their SMTs are causing performance
> > > > issues?
> > > >
> > >
> > > Metrics aren't the only way to gain insight about performance and
> always
> > > measuring this even when it's not necessarily being used may not make
> > > sense. SMT authors are much better off starting out with a JMH or
> similar
> > > benchmark. What I was referring to in the talk is more about
> > understanding
> > > that the processing for SMTs is entirely synchronous and that means
> > certain
> > > classes of operations will just generally be a bad idea, e.g. anything
> > that
> > > goes out over the network to another service. You don't even really
> need
> > > performance info to determine that that type of transformation will
> cause
> > > problems.
> > >
> > > But my point wasn't that timing info isn't useful. It's that we know
> that
> > > getting timestamps is pretty expensive and we'll already be doing so
> > > elsewhere (e.g. if a source record doesn't include a timestamp). For
> some
> > > use cases such as ByteArrayConverter + no SMTs + lightweight processing
> > > (e.g. just gets handed to a background thread that deals with sending
> the
> > > data), it wouldn't be out of the question that adding 4 or so more
> calls
> > to
> > > get timestamps could become a bottleneck. Since I don't know if it
> would
> > > but we have definitely seen the issue come up before, I would be
> > > conservative in adding the metrics unless we had some numbers showing
> it
> > > doesn't matter or doesn't matter much.
> > >
> > > In general, I don't think metrics that require always-on measurement
> are
> > a
> > > good way to get fine grained performance information. Instrumenting
> > > different phases that imply different types of performance problems can
> > be
> > > helpful (e.g. "processing time" that should be CPU/memory throughput
> > bound
> > > vs. "send time" that, at least for many connectors, is more likely to
> be
> > IO
> > > bound), but if you want finer-grained details, you probably either want
> > > something that can be toggled on/off temporarily or just use a tool
> > that's
> > > really designed for the job, i.e. a profiler like perf.
> > >
> > > -Ewen
> > >
> > >
> > > >
> > > > Gwen
> > > >
> > > > On Mon, Sep 11, 2017 at 7:50 PM Ewen Cheslack-Postava <
> > e...@confluent

Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-12 Thread Roger Hoover
ource record rate at the (worker,
> > > > connector)
> > > > >> level really useful since you're just going to need to do
> additional
> > > > >> aggregation anyway once you've collected metrics across all
> workers?
> > > I'd
> > > > >> rather add a smaller number of metrics w/ clear use cases than
> just
> > > try
> > > > to
> > > > >> be exhaustive and then have to maintain stuff that nobody actually
> > > uses.
> > > > >>
> > > > >
> > > > > Yes, the connector aggregate metrics are maybe not as useful if you
> > > also
> > > > > have to aggregate them from different workers. Removing them
> probably
> > > > also
> > > > > reduces the risk of them being misinterpretted.
> > > > >
> > > > >
> > > > >> * You have status for connectors but not for tasks. Any reason
> why?
> > > > Seems
> > > > >> like it'd make sense to expose both, especially since users
> > generally
> > > > care
> > > > >> about task status more than connector status (not many connectors
> > > > actually
> > > > >> run a monitoring thread.)
> > > > >>
> > > > >
> > > > > Ack.
> > > > >
> > > > >
> > > > >> * Is number of tasks for each connector a useful metric? Not sure
> > > > whether
> > > > >> someone would find this useful or not. Probably not for alerts,
> but
> > > > might
> > > > >> be useful to be able to check it via your metrics dashboard.
> > > > >>
> > > > >
> > > > > Seems like it might be useful, at least in terms of tracking the
> > number
> > > > of
> > > > > tasks over time. Might not be as useful for connectors that have
> > > > relatively
> > > > > static tasks, but it would be more interesting/useful for
> connectors
> > > that
> > > > > create tasks dynamically and periodically request task
> > > reconfigurations.
> > > > >
> > > > >
> > > > >> * Same questions re: granularity of sink tasks/connectors timing
> and
> > > > >> whether the connectors need all the roll-ups of individual
> (worker,
> > > > task)
> > > > >> values to (worker, connector) level.
> > > > >>
> > > > >
> > > > > I'm fine with taking out the aggregates to keep things simple and
> > > prevent
> > > > > misunderstanding.
> > > > >
> > > > >
> > > > >> * If we expose the who the worker currently thinks is leader, it
> > might
> > > > >> also
> > > > >> make sense to expose the underlying epoch. Not actually sure if we
> > > > expose
> > > > >> that for the consumer today, but it's an indicator of who is
> > properly
> > > up
> > > > >> to
> > > > >> date.
> > > > >>
> > > > >
> > > > > Ack.
> > > > >
> > > > >
> > > > >> * Why worker-level offset commit stats? It's not clear to me that
> > > these
> > > > >> are
> > > > >> useful without considering the specific connector.
> > > > >>
> > > > >
> > > > > So would they make more sense on the tasks? Again, on the worker
> > > they're
> > > > > aggregates.
> > > > >
> > > > >
> > > > >>
> > > > >> -Ewen
> > > > >>
> > > > >>
> > > > >> On Mon, Sep 11, 2017 at 9:43 AM, Randall Hauch <rha...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> > Thanks for reviewing. Responses inline below.
> > > > >> >
> > > > >> > On Mon, Sep 11, 2017 at 11:22 AM, Roger Hoover <
> > > > roger.hoo...@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Randall,
> > > > >> > >
> > > > >> > > Thank you for the KIP.  This should improve visibility
> > greatly.  I
> > > > >> had a
> > > > >> > > few questions/ideas fo

Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-12 Thread Roger Hoover
; >
> > >> * If we include the worker id everywhere and don't have metrics
> without
> > >> that included, isn't that a pain for users that dump this data into
> some
> > >> other system? They have to know which worker the connector/task is
> > >> currently on *or* need to do extra work to merge the metrics from
> across
> > >> machines. Including versions with the worker ID can make sense for
> > >> completeness and accuracy (e.g. technically there are still very slim
> > >> risks
> > >> of having a task running twice due to zombies), but it seems like bad
> > >> usability for the common case.
> > >>
> > >
> > > Part of the reason was also to help identify where each of the metrics
> > > came from, but per the next comment this may not be as useful, either.
> > > So remove the worker ID in all the task and connector metric names?
> What
> > > about the worker metrics?
> > >
> > >
> > >> * Is aggregating things like source record rate at the (worker,
> > connector)
> > >> level really useful since you're just going to need to do additional
> > >> aggregation anyway once you've collected metrics across all workers?
> I'd
> > >> rather add a smaller number of metrics w/ clear use cases than just
> try
> > to
> > >> be exhaustive and then have to maintain stuff that nobody actually
> uses.
> > >>
> > >
> > > Yes, the connector aggregate metrics are maybe not as useful if you
> also
> > > have to aggregate them from different workers. Removing them probably
> > also
> > > reduces the risk of them being misinterpretted.
> > >
> > >
> > >> * You have status for connectors but not for tasks. Any reason why?
> > Seems
> > >> like it'd make sense to expose both, especially since users generally
> > care
> > >> about task status more than connector status (not many connectors
> > actually
> > >> run a monitoring thread.)
> > >>
> > >
> > > Ack.
> > >
> > >
> > >> * Is number of tasks for each connector a useful metric? Not sure
> > whether
> > >> someone would find this useful or not. Probably not for alerts, but
> > might
> > >> be useful to be able to check it via your metrics dashboard.
> > >>
> > >
> > > Seems like it might be useful, at least in terms of tracking the number
> > of
> > > tasks over time. Might not be as useful for connectors that have
> > relatively
> > > static tasks, but it would be more interesting/useful for connectors
> that
> > > create tasks dynamically and periodically request task
> reconfigurations.
> > >
> > >
> > >> * Same questions re: granularity of sink tasks/connectors timing and
> > >> whether the connectors need all the roll-ups of individual (worker,
> > task)
> > >> values to (worker, connector) level.
> > >>
> > >
> > > I'm fine with taking out the aggregates to keep things simple and
> prevent
> > > misunderstanding.
> > >
> > >
> > >> * If we expose the who the worker currently thinks is leader, it might
> > >> also
> > >> make sense to expose the underlying epoch. Not actually sure if we
> > expose
> > >> that for the consumer today, but it's an indicator of who is properly
> up
> > >> to
> > >> date.
> > >>
> > >
> > > Ack.
> > >
> > >
> > >> * Why worker-level offset commit stats? It's not clear to me that
> these
> > >> are
> > >> useful without considering the specific connector.
> > >>
> > >
> > > So would they make more sense on the tasks? Again, on the worker
> they're
> > > aggregates.
> > >
> > >
> > >>
> > >> -Ewen
> > >>
> > >>
> > >> On Mon, Sep 11, 2017 at 9:43 AM, Randall Hauch <rha...@gmail.com>
> > wrote:
> > >>
> > >> > Thanks for reviewing. Responses inline below.
> > >> >
> > >> > On Mon, Sep 11, 2017 at 11:22 AM, Roger Hoover <
> > roger.hoo...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Randall,
> > >> > >
> > >> > > Thank you for the KIP.  This should improve visibility greatly.  I
> > >

Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-11 Thread Roger Hoover
Randall,

Thank you for the KIP.  This should improve visibility greatly.  I had a
few questions/ideas for more metrics.


   1. What's the relationship between the worker state and the connector
   status?  Does the 'paused' status at the Connector level include the time
   that worker is 'rebalancing'?
   2. Are the "Source Connector" metrics like record rate an aggregation of
   the "Source Task" metrics?
  - How much value is there is monitoring at the "Source Connector"
  level (other than status) if the number of constituent tasks may change
  over time?
  - I'm imagining that it's most useful to collect metrics at the task
  level as the task-level metrics should be stable regardless of tasks
  shifting to different workers
  - If so, can we duplicate the Connector Status down at the task level
 so that all important metrics can be tracked by task?
 3. For the Sink Task metrics
  - Can we add offset lag and timestamp lag on commit?
 - After records are flushed/committed
- what is the diff between the record timestamps and commit
time (histogram)?  this is a measure of end-to-end pipeline latency
- what is the diff between record offsets and latest offset of
their partition at commit time (histogram)? this is a
measure of whether
this particular task is keeping up
 - How about flush error rate?  Assuming the sink connectors are
  using retries, it would be helpful to know how many errors they're seeing
  - Can we tell at the framework level how many records were inserted
  vs updated vs deleted?
  - Batching stats
 - Histogram of flush batch size
 - Counts of flush trigger method (time vs max batch size)

Cheers,

Roger

On Sun, Sep 10, 2017 at 8:45 AM, Randall Hauch  wrote:

> Thanks, Gwen.
>
> That's a great idea, so I've changed the KIP to add those metrics. I've
> also made a few other changes:
>
>
>1. The context of all metrics is limited to the activity within the
>worker. This wasn't clear before, so I changed the motivation and metric
>descriptions to explicitly state this.
>2. Added the worker ID to all MBean attributes. In addition to hopefully
>making this same scope obvious from within JMX or other metric reporting
>system. This is also similar to how the Kafka producer and consumer
> metrics
>include the client ID in their MBean attributes. Hopefully this does not
>negatively impact or complicate how external reporting systems'
> aggregate
>metrics from multiple workers.
>3. Stated explicitly that aggregating metrics across workers was out of
>scope of this KIP.
>4. Added metrics to report the connector class and version for both sink
>and source connectors.
>
> Check this KIP's history for details of these changes.
>
> Please let me know if you have any other suggestions. I hope to start the
> voting soon!
>
> Best regards,
>
> Randall
>
> On Thu, Sep 7, 2017 at 9:35 PM, Gwen Shapira  wrote:
>
> > Thanks for the KIP, Randall. Those are badly needed!
> >
> > Can we have two metrics with record rate per task? One before SMT and one
> > after?
> > We can have cases where we read 5000 rows from JDBC but write 5 to Kafka,
> > or read 5000 records from Kafka and write 5 due to filtering. I think its
> > important to know both numbers.
> >
> >
> > Gwen
> >
> > On Thu, Sep 7, 2017 at 7:50 PM, Randall Hauch  wrote:
> >
> > > Hi everyone.
> > >
> > > I've created a new KIP to add metrics to the Kafka Connect framework:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 196%3A+Add+metrics+to+Kafka+Connect+framework
> > >
> > > The KIP approval deadline is looming, so if you're interested in Kafka
> > > Connect metrics please review and provide feedback as soon as possible.
> > I'm
> > > interested not only in whether the metrics are sufficient and
> > appropriate,
> > > but also in whether the MBean naming conventions are okay.
> > >
> > > Best regards,
> > >
> > > Randall
> > >
> >
> >
> >
> > --
> > *Gwen Shapira*
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter  | blog
> > 
> >
>


Re: [DISCUSS] KIP-170: Enhanced TopicCreatePolicy and introduction of TopicDeletePolicy

2017-09-07 Thread Roger Hoover
Edoardo, thanks for the KIP.  I think it's a good idea overall.

+1 especially for including Session/Principal in the API.  (#2 mentioned by
Ismael)

Also, the AlterPolicy should get the same info as create + delete (#4).

Cheers,

Roger

On Thu, Sep 7, 2017 at 8:43 AM, Ismael Juma  wrote:

> Hi Tom,
>
> Maybe we can discuss that as part of KIP-179. I don't think we should
> introduce a separate interface for the changes suggested here.
>
> Ismael
>
> On Tue, Sep 5, 2017 at 7:05 PM, Tom Bentley  wrote:
>
> > Hi Edoardo,
> >
> > KIP-179 will support changing topic configurations via the AdminClient
> and
> > this would open a loophole to avoid the current CreateTopicPolicy:
> >
> > 1. Create a topic that sticks to the policy
> > 2. Modify the topic after creation to something which would have violated
> > the policy.
> >
> > For this reason KIP-179 proposes to apply the create topic policy to
> topic
> > modification too. If there were going to be a new "V2" interface (as
> > opposed to changing the RequestMetadata) I think it would be beneficial
> if
> > the name reflected that it applied to updates too.
> > CreateOrAlterTopicPolicy? Though obviously that name doesn't reflect that
> > its heritage in the original CreateTopicPolicy.
> >
> > Cheers,
> >
> > Tom
> >
> > On 5 September 2017 at 18:48, Edoardo Comar  wrote:
> >
> > > Hi Ismael,
> > > I was on leave for a long while. I will update the KIP.
> > >
> > > Edo
> > >
> > > On 5 September 2017 at 11:42, Ismael Juma  wrote:
> > >
> > > > Hi Edoardo,
> > > >
> > > > Do you intend to update the KIP to avoid the introduction of another
> > > > interface?
> > > >
> > > > Ismael
> > > >
> > > > On Fri, Jun 23, 2017 at 5:47 PM, Edoardo Comar 
> > > wrote:
> > > >
> > > > > Thanks for the thoughts Ismael
> > > > >
> > > > > > 1. Have you considered extending RequestMetadata with the
> > additional
> > > > > > information you need? We could add Cluster to it, which has topic
> > > > > > assignment information, for example. This way, there would be no
> > need
> > > > > for a
> > > > > > V2 interface.
> > > > >
> > > > > 1. I hadn't thought of it - but this seems a feasible alternative.
> > > > >
> > > > > The XXXPolicy.RequestMetadata could be enriched to include more
> > > > > information -
> > > > > for backward compatibility with existing Policies we would only add
> > > > > methods to these classes.
> > > > >
> > > > > Presumably the number of Admin Request is not huge so we should not
> > be
> > > > too
> > > > > worried
> > > > > about the overhead of populating a org.apache.kafka.common.Cluster
> > for
> > > > > every admin request , right ?
> > > > >
> > > > > The only feature I can't see how to easily support with the
> enriched
> > > > > XXXPolicy.RequestMetadata is how to check
> > > > > for a topic being marked for deletion using the Cluster information
> > > > >
> > > > > This check is useful for us in excluding such outliers when
> counting
> > > the
> > > > > number of partitions.
> > > > >
> > > > >
> > > > > > 2. Something else that could be useful is passing an instance of
> > > > > `Session`
> > > > > > so that one can provide custom behaviour depending on the logged
> in
> > > > > user.
> > > > > > Would this be useful?
> > > > > 2. Definitely I would expect that a general solution includes the
> > > Session
> > > > > or the KafkaPrincipal associated with it
> > > > > (the latter may be a simpler dependency for the
> > > XXXPolicy.RequestMetadata
> > > > >
> > > > >
> > > > > > 3. For the delete case, we may consider passing a class instead
> of
> > > just
> > > > > a
> > > > > > string to the validate method so that we have options if we need
> to
> > > > > extend
> > > > > > it.
> > > > >
> > > > > 3. Agree, we should have the DeletePolicy define its
> RequestMetadata
> > > > > class, too
> > > > >
> > > > >
> > > > > > 4. Do we want to enhance the AlterConfigs policy as well?
> > > > >
> > > > > 4. I don't see why not :-)
> > > > >
> > > > >
> > > > >
> > > > > thanks
> > > > > Edo
> > > > > --
> > > > >
> > > > > Edoardo Comar
> > > > >
> > > > > IBM Message Hub
> > > > >
> > > > >
> > > > > isma...@gmail.com wrote on 22/06/2017 15:05:06:
> > > > >
> > > > > > From: Ismael Juma 
> > > > > > To: dev@kafka.apache.org
> > > > > > Date: 22/06/2017 15:05
> > > > > > Subject: Re: [DISCUSS] KIP-170: Enhanced TopicCreatePolicy and
> > > > > > introduction of TopicDeletePolicy
> > > > > > Sent by: isma...@gmail.com
> > > > > >
> > > > > > Thanks for the KIP, Edoardo. A few comments:
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Thu, Jun 22, 2017 at 2:41 PM, Edoardo Comar <
> eco...@uk.ibm.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > We've drafted 

Re: [DISCUSS] KIP-185: Make exactly once in order delivery per partition the default producer setting

2017-08-31 Thread Roger Hoover
Makes sense in terms of priorities.  Thanks, Apurva.

On Thu, Aug 31, 2017 at 11:15 AM, Apurva Mehta <apu...@confluent.io> wrote:

> Thanks for the message, Roger.
>
> I think having 'acks=all' imply 'acks=minIsr' will probably result in some
> improvement in the latency. However, I would note two things:
>
> 1. The numbers on the wiki are latency at max throughput, which should not
> be representative of actual latency degradation. We should aim for more
> representative numbers before trying to solve for the problem. I intend to
> do this.
> 2. We have some toggles to reduce replication latency like
> ReplicaFetchMinBytes, ReplicaFetchWaitMaxMs. We can improve the replication
> latency by tuning these variables.
> 3. Engineers from Uber have played with the settings above and noticed that
> our protocol is too inefficient to support frequent replica fetches. We
> plan to improve this in the coming months.
>
> If we solve the low hanging fruit in (3), we can probably have lower
> latency configs in (2), which would mean that the actual impact of acks=all
> reduces. In this context, acks=minIsr will probably show a smaller impact
> on reducing replication latency. But it would definitely be interesting to
> get those numbers and see whether the theory bears out in practice.
>
> Regards,
> Apurva
>
> On Thu, Aug 31, 2017 at 8:56 AM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Sorry, my math was sloppy.  It's not twice as many requests taking
> longer.
> > If the probability of replication latency longer than X is Px for both
> > replicas then,
> >
> > acks=all will have probability of Px(2-Px) of replication lag longer
> than X
> > while
> > acks=minIsr will be Px
> >
> >
> > On Wed, Aug 30, 2017 at 5:18 PM, Roger Hoover <roger.hoo...@gmail.com>
> > wrote:
> >
> > > Sorry if this is a bit out of left field but can't help wondering...
> > >
> > > One way to improve producer performance while still having good
> > guarantees
> > > would be to allow a setting between acks=1 and acks=all.  We could
> > > introduce "acks=minIsr".  This is already the guarantee you get when
> the
> > > ISR set shrinks below your replication factor.  Why not allow producers
> > to
> > > get notified when minIsr replication has been acheived even when the
> ISR
> > > set is full?
> > >
> > > For rep factor == 3 and min.in.sync.replicas == 2 and sizeOf(ISR) == 3:
> > > * with acks=all, the remote time of each request will be max(lag of 2
> > > followers) whereas
> > > * with acks=minIsr, the remote time of each request will be min(lag of
> 2
> > > followers)
> > >
> > > Whatever your latency distribution is for replication, for any given
> > > remote time (say 100 ms), twice as many requests take longer than that
> > time
> > > with acks=all vs acks=minIsr.
> > >
> > > Thoughts?
> > >
> > > Roger
> > >
> > > On Wed, Aug 30, 2017 at 4:52 PM, Apurva Mehta <apu...@confluent.io>
> > wrote:
> > >
> > >> Hi Ted, int16 is sufficient. I forgot to specify initially. I have
> > updated
> > >> the KIP.
> > >>
> > >> Thanks for pointing it out!
> > >> Apurva
> > >>
> > >> On Wed, Aug 30, 2017 at 4:43 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >>
> > >> > For ProduceRequest v4, would int32 or int16 be enough for
> > >> idempotenceLevel
> > >> > ?
> > >> >
> > >> > Cheers
> > >> >
> > >> > On Wed, Aug 30, 2017 at 3:47 PM, Apurva Mehta <apu...@confluent.io>
> > >> wrote:
> > >> >
> > >> > > Thanks Ismael and Jason, I filed a separate KIP to solve the
> > problems
> > >> > > identified through this discussion. I also incorporated Jason's
> > >> comments
> > >> > in
> > >> > > that document:
> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > > 192+%3A+Provide+cleaner+semantics+when+idempotence+is+enabled
> > >> > >
> > >> > > Please have a look,
> > >> > > Apurva
> > >> > >
> > >> > > On Tue, Aug 29, 2017 at 3:28 AM, Ismael Juma <ism...@juma.me.uk>
> > >> wrote:
> > >> > >
> > >> > > > Thanks for the proposals. I think they make sense and I also
> agree
> > 

Re: [DISCUSS] KIP-185: Make exactly once in order delivery per partition the default producer setting

2017-08-31 Thread Roger Hoover
Sorry, my math was sloppy.  It's not twice as many requests taking longer.
If the probability of replication latency longer than X is Px for both
replicas then,

acks=all will have probability of Px(2-Px) of replication lag longer than X
while
acks=minIsr will be Px


On Wed, Aug 30, 2017 at 5:18 PM, Roger Hoover <roger.hoo...@gmail.com>
wrote:

> Sorry if this is a bit out of left field but can't help wondering...
>
> One way to improve producer performance while still having good guarantees
> would be to allow a setting between acks=1 and acks=all.  We could
> introduce "acks=minIsr".  This is already the guarantee you get when the
> ISR set shrinks below your replication factor.  Why not allow producers to
> get notified when minIsr replication has been acheived even when the ISR
> set is full?
>
> For rep factor == 3 and min.in.sync.replicas == 2 and sizeOf(ISR) == 3:
> * with acks=all, the remote time of each request will be max(lag of 2
> followers) whereas
> * with acks=minIsr, the remote time of each request will be min(lag of 2
> followers)
>
> Whatever your latency distribution is for replication, for any given
> remote time (say 100 ms), twice as many requests take longer than that time
> with acks=all vs acks=minIsr.
>
> Thoughts?
>
> Roger
>
> On Wed, Aug 30, 2017 at 4:52 PM, Apurva Mehta <apu...@confluent.io> wrote:
>
>> Hi Ted, int16 is sufficient. I forgot to specify initially. I have updated
>> the KIP.
>>
>> Thanks for pointing it out!
>> Apurva
>>
>> On Wed, Aug 30, 2017 at 4:43 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> > For ProduceRequest v4, would int32 or int16 be enough for
>> idempotenceLevel
>> > ?
>> >
>> > Cheers
>> >
>> > On Wed, Aug 30, 2017 at 3:47 PM, Apurva Mehta <apu...@confluent.io>
>> wrote:
>> >
>> > > Thanks Ismael and Jason, I filed a separate KIP to solve the problems
>> > > identified through this discussion. I also incorporated Jason's
>> comments
>> > in
>> > > that document:
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > 192+%3A+Provide+cleaner+semantics+when+idempotence+is+enabled
>> > >
>> > > Please have a look,
>> > > Apurva
>> > >
>> > > On Tue, Aug 29, 2017 at 3:28 AM, Ismael Juma <ism...@juma.me.uk>
>> wrote:
>> > >
>> > > > Thanks for the proposals. I think they make sense and I also agree
>> with
>> > > > Jason's suggestions. Also, it would be good to include the updated
>> > > > ProduceRequest/Response schema in the KIP.
>> > > >
>> > > > Ismael
>> > > >
>> > > > On Tue, Aug 22, 2017 at 11:42 PM, Jason Gustafson <
>> ja...@confluent.io>
>> > > > wrote:
>> > > >
>> > > > > Thanks Apurva,
>> > > > >
>> > > > > On compatibility: I think the proposal makes sense. It's a pity
>> that
>> > we
>> > > > > can't support idempotence for 0.11.0.0 brokers in the "safe" mode
>> > even
>> > > if
>> > > > > it is supported by the broker. I can already imagine users
>> > complaining
>> > > > > about this, but I guess it's the consequence of missing the
>> impact of
>> > > > that
>> > > > > validation check and not thinking through the ultimate goal of
>> > enabling
>> > > > > idempotence by default. A couple minor comments:
>> > > > >
>> > > > > 1. Instead of "safe," Ismael suggested "requested" as an
>> alternative.
>> > > > That
>> > > > > seems to suggest more clearly that idempotence will only be used
>> when
>> > > the
>> > > > > broker supports it.
>> > > > > 2. Should we deprecate the "true" and "false" options? It's a
>> little
>> > > > weird
>> > > > > long term to support them in addition to the descriptive names.
>> > > > >
>> > > > > On the OutOfOrderSequence proposal: high-level, the design makes
>> > > sense. A
>> > > > > couple questions:
>> > > > >
>> > > > > 1. With this proposal, OutOfOrderSequence means that we must have
>> a
>> > > last
>> > > > > produced offset. Is the idea to expose that in the
>> > >

Re: [DISCUSS] KIP-185: Make exactly once in order delivery per partition the default producer setting

2017-08-30 Thread Roger Hoover
Sorry if this is a bit out of left field but can't help wondering...

One way to improve producer performance while still having good guarantees
would be to allow a setting between acks=1 and acks=all.  We could
introduce "acks=minIsr".  This is already the guarantee you get when the
ISR set shrinks below your replication factor.  Why not allow producers to
get notified when minIsr replication has been acheived even when the ISR
set is full?

For rep factor == 3 and min.in.sync.replicas == 2 and sizeOf(ISR) == 3:
* with acks=all, the remote time of each request will be max(lag of 2
followers) whereas
* with acks=minIsr, the remote time of each request will be min(lag of 2
followers)

Whatever your latency distribution is for replication, for any given remote
time (say 100 ms), twice as many requests take longer than that time with
acks=all vs acks=minIsr.

Thoughts?

Roger

On Wed, Aug 30, 2017 at 4:52 PM, Apurva Mehta  wrote:

> Hi Ted, int16 is sufficient. I forgot to specify initially. I have updated
> the KIP.
>
> Thanks for pointing it out!
> Apurva
>
> On Wed, Aug 30, 2017 at 4:43 PM, Ted Yu  wrote:
>
> > For ProduceRequest v4, would int32 or int16 be enough for
> idempotenceLevel
> > ?
> >
> > Cheers
> >
> > On Wed, Aug 30, 2017 at 3:47 PM, Apurva Mehta 
> wrote:
> >
> > > Thanks Ismael and Jason, I filed a separate KIP to solve the problems
> > > identified through this discussion. I also incorporated Jason's
> comments
> > in
> > > that document:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 192+%3A+Provide+cleaner+semantics+when+idempotence+is+enabled
> > >
> > > Please have a look,
> > > Apurva
> > >
> > > On Tue, Aug 29, 2017 at 3:28 AM, Ismael Juma 
> wrote:
> > >
> > > > Thanks for the proposals. I think they make sense and I also agree
> with
> > > > Jason's suggestions. Also, it would be good to include the updated
> > > > ProduceRequest/Response schema in the KIP.
> > > >
> > > > Ismael
> > > >
> > > > On Tue, Aug 22, 2017 at 11:42 PM, Jason Gustafson <
> ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Thanks Apurva,
> > > > >
> > > > > On compatibility: I think the proposal makes sense. It's a pity
> that
> > we
> > > > > can't support idempotence for 0.11.0.0 brokers in the "safe" mode
> > even
> > > if
> > > > > it is supported by the broker. I can already imagine users
> > complaining
> > > > > about this, but I guess it's the consequence of missing the impact
> of
> > > > that
> > > > > validation check and not thinking through the ultimate goal of
> > enabling
> > > > > idempotence by default. A couple minor comments:
> > > > >
> > > > > 1. Instead of "safe," Ismael suggested "requested" as an
> alternative.
> > > > That
> > > > > seems to suggest more clearly that idempotence will only be used
> when
> > > the
> > > > > broker supports it.
> > > > > 2. Should we deprecate the "true" and "false" options? It's a
> little
> > > > weird
> > > > > long term to support them in addition to the descriptive names.
> > > > >
> > > > > On the OutOfOrderSequence proposal: high-level, the design makes
> > > sense. A
> > > > > couple questions:
> > > > >
> > > > > 1. With this proposal, OutOfOrderSequence means that we must have a
> > > last
> > > > > produced offset. Is the idea to expose that in the
> > > > > OutOfOrderSequenceException so that users know which data was lost?
> > > > > 2. Previously we discussed duplicate handling. Currently we raise
> > > > > OutOfOrderSequence if we happen to get a sequence number which is
> > > earlier
> > > > > than the sequence numbers we have cached. Alternatively, you
> > suggested
> > > we
> > > > > can return a separate DuplicateError for this case, which clients
> can
> > > > > ignore if they do not care about the offset. I think it might make
> > > sense
> > > > to
> > > > > include that here so that the OutOfOrderSequence error is
> > unambiguous.
> > > > >
> > > > > Finally, do you plan to roll these proposals into the current KIP
> or
> > do
> > > > > them separately? Probably makes sense to combine them since they
> both
> > > > > require a bump to the ProduceRequest.
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Aug 18, 2017 at 5:18 PM, Apurva Mehta  >
> > > > wrote:
> > > > >
> > > > > > Thanks Jason and Ismael.
> > > > > >
> > > > > > The message format problem is an acute one: if we enable
> > idempotence
> > > by
> > > > > > default, the UnsupportedVersionException when writing to topics
> > with
> > > > the
> > > > > > older message format would mean that our prescribed upgrade steps
> > > would
> > > > > not
> > > > > > work. I have detailed the problems and the solutions on this page
> > > > (linked
> > > > > > to from the wiki):
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/
> > > > > > 

Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-30 Thread Roger Hoover
Great.  Thank you, Rajini.

On Wed, Aug 30, 2017 at 7:53 AM, Rajini Sivaram <rajinisiva...@gmail.com>
wrote:

> Hi Roger,
>
> Thank you for the suggestions.
>
> I think we should have a separate JIRA to address logging improvements for
> authentication. That shouldn't need a KIP. The way the code is structured
> at the moment, SSL implementation is in the network package. And that makes
> it a bit messy to move authentication logs into a separate config.
>
> I have added an error_message field to SaslAuthenticate response.
>
> For those who have already voted, please let me know if you have any
> concerns about the new field.
>
> Thank you...
>
> Regards,
>
> Rajini
>
>
>
> On Tue, Aug 29, 2017 at 8:44 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Hi Rajini,
> >
> > One more thought.  Maybe we should also add an error_message field in the
> > response like we do with CreateTopics response so that the server can
> > return an appropriate message that we can bubble up to the user.
> Examples
> > would be "Invalid username or password", "SASL Impersonation not
> allowed",
> > or "You account has been locked, please contact cluster admin".
> >
> > Thanks,
> >
> > Roger
> >
> > On Tue, Aug 29, 2017 at 12:41 PM, Roger Hoover <roger.hoo...@gmail.com>
> > wrote:
> >
> > > Hi Rajini,
> > >
> > > The metrics in KIP-188 will provide counts across all users but the log
> > > could potentially be used to audit individual authentication events.  I
> > > think these would be useful at INFO level but if it's inconsistent with
> > the
> > > rest of Kafka, DEBUG is ok too.  The default log4j config for Kafka
> > > separates authorization logs.  It seems like a good idea to treat
> > > authentication logs the same way whether or not we choose DEBUG or
> INFO.
> > >
> > > https://github.com/apache/kafka/blob/trunk/config/log4j.
> > properties#L54-L58
> > >
> > > Cheers,
> > >
> > > Roger
> > >
> > > On Tue, Aug 29, 2017 at 10:51 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > wrote:
> > >
> > >> Hi Roger,
> > >>
> > >> If we are changing logging level for successful SASL authentications
> in
> > >> the
> > >> broker, we should probably do the same for SSL too. Since KIP-188
> > proposes
> > >> to add new metrics for successful and failed authentications which may
> > be
> > >> more useful for monitoring, do we really need info-level logging for
> > >> authentication? At the moment, there don't seem to be any
> per-connection
> > >> informational messages at info-level, but if you think it is useful,
> we
> > >> could do this in a separate JIRA. Let me know what you think.
> > >>
> > >> On Tue, Aug 29, 2017 at 1:09 PM, Roger Hoover <roger.hoo...@gmail.com
> >
> > >> wrote:
> > >>
> > >> > Just re-read the KIP and was wondering if you think INFO would be ok
> > for
> > >> > logging successful authentications?  They should be relatively
> > >> infrequent.
> > >> >
> > >> > On Tue, Aug 29, 2017 at 9:54 AM, Roger Hoover <
> roger.hoo...@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> > > +1 (non-binding).  Thanks, Rajini
> > >> > >
> > >> > > On Tue, Aug 29, 2017 at 2:10 AM, Ismael Juma <ism...@juma.me.uk>
> > >> wrote:
> > >> > >
> > >> > >> Thanks for the KIP, +1 (binding) from me.
> > >> > >>
> > >> > >> Ismael
> > >> > >>
> > >> > >> On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram <
> > >> > rajinisiva...@gmail.com>
> > >> > >> wrote:
> > >> > >>
> > >> > >> > Hi all,
> > >> > >> >
> > >> > >> > I would like to start vote on KIP-152 to improve diagnostics of
> > >> > >> > authentication failures and to update clients to treat
> > >> authentication
> > >> > >> > failures as fatal exceptions rather than transient errors:
> > >> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > >> > 152+-+Improve+diagnostics+for+SASL+authentication+failures
> > >> > >> >
> > >> > >> > Thank you...
> > >> > >> >
> > >> > >> > Rajini
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>


Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-29 Thread Roger Hoover
Hi Rajini,

One more thought.  Maybe we should also add an error_message field in the
response like we do with CreateTopics response so that the server can
return an appropriate message that we can bubble up to the user.  Examples
would be "Invalid username or password", "SASL Impersonation not allowed",
or "You account has been locked, please contact cluster admin".

Thanks,

Roger

On Tue, Aug 29, 2017 at 12:41 PM, Roger Hoover <roger.hoo...@gmail.com>
wrote:

> Hi Rajini,
>
> The metrics in KIP-188 will provide counts across all users but the log
> could potentially be used to audit individual authentication events.  I
> think these would be useful at INFO level but if it's inconsistent with the
> rest of Kafka, DEBUG is ok too.  The default log4j config for Kafka
> separates authorization logs.  It seems like a good idea to treat
> authentication logs the same way whether or not we choose DEBUG or INFO.
>
> https://github.com/apache/kafka/blob/trunk/config/log4j.properties#L54-L58
>
> Cheers,
>
> Roger
>
> On Tue, Aug 29, 2017 at 10:51 AM, Rajini Sivaram <rajinisiva...@gmail.com>
> wrote:
>
>> Hi Roger,
>>
>> If we are changing logging level for successful SASL authentications in
>> the
>> broker, we should probably do the same for SSL too. Since KIP-188 proposes
>> to add new metrics for successful and failed authentications which may be
>> more useful for monitoring, do we really need info-level logging for
>> authentication? At the moment, there don't seem to be any per-connection
>> informational messages at info-level, but if you think it is useful, we
>> could do this in a separate JIRA. Let me know what you think.
>>
>> On Tue, Aug 29, 2017 at 1:09 PM, Roger Hoover <roger.hoo...@gmail.com>
>> wrote:
>>
>> > Just re-read the KIP and was wondering if you think INFO would be ok for
>> > logging successful authentications?  They should be relatively
>> infrequent.
>> >
>> > On Tue, Aug 29, 2017 at 9:54 AM, Roger Hoover <roger.hoo...@gmail.com>
>> > wrote:
>> >
>> > > +1 (non-binding).  Thanks, Rajini
>> > >
>> > > On Tue, Aug 29, 2017 at 2:10 AM, Ismael Juma <ism...@juma.me.uk>
>> wrote:
>> > >
>> > >> Thanks for the KIP, +1 (binding) from me.
>> > >>
>> > >> Ismael
>> > >>
>> > >> On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram <
>> > rajinisiva...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Hi all,
>> > >> >
>> > >> > I would like to start vote on KIP-152 to improve diagnostics of
>> > >> > authentication failures and to update clients to treat
>> authentication
>> > >> > failures as fatal exceptions rather than transient errors:
>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > >> > 152+-+Improve+diagnostics+for+SASL+authentication+failures
>> > >> >
>> > >> > Thank you...
>> > >> >
>> > >> > Rajini
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>


Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-29 Thread Roger Hoover
Hi Rajini,

The metrics in KIP-188 will provide counts across all users but the log
could potentially be used to audit individual authentication events.  I
think these would be useful at INFO level but if it's inconsistent with the
rest of Kafka, DEBUG is ok too.  The default log4j config for Kafka
separates authorization logs.  It seems like a good idea to treat
authentication logs the same way whether or not we choose DEBUG or INFO.

https://github.com/apache/kafka/blob/trunk/config/log4j.properties#L54-L58

Cheers,

Roger

On Tue, Aug 29, 2017 at 10:51 AM, Rajini Sivaram <rajinisiva...@gmail.com>
wrote:

> Hi Roger,
>
> If we are changing logging level for successful SASL authentications in the
> broker, we should probably do the same for SSL too. Since KIP-188 proposes
> to add new metrics for successful and failed authentications which may be
> more useful for monitoring, do we really need info-level logging for
> authentication? At the moment, there don't seem to be any per-connection
> informational messages at info-level, but if you think it is useful, we
> could do this in a separate JIRA. Let me know what you think.
>
> On Tue, Aug 29, 2017 at 1:09 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Just re-read the KIP and was wondering if you think INFO would be ok for
> > logging successful authentications?  They should be relatively
> infrequent.
> >
> > On Tue, Aug 29, 2017 at 9:54 AM, Roger Hoover <roger.hoo...@gmail.com>
> > wrote:
> >
> > > +1 (non-binding).  Thanks, Rajini
> > >
> > > On Tue, Aug 29, 2017 at 2:10 AM, Ismael Juma <ism...@juma.me.uk>
> wrote:
> > >
> > >> Thanks for the KIP, +1 (binding) from me.
> > >>
> > >> Ismael
> > >>
> > >> On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > I would like to start vote on KIP-152 to improve diagnostics of
> > >> > authentication failures and to update clients to treat
> authentication
> > >> > failures as fatal exceptions rather than transient errors:
> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > 152+-+Improve+diagnostics+for+SASL+authentication+failures
> > >> >
> > >> > Thank you...
> > >> >
> > >> > Rajini
> > >> >
> > >>
> > >
> > >
> >
>


Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-29 Thread Roger Hoover
Just re-read the KIP and was wondering if you think INFO would be ok for
logging successful authentications?  They should be relatively infrequent.

On Tue, Aug 29, 2017 at 9:54 AM, Roger Hoover <roger.hoo...@gmail.com>
wrote:

> +1 (non-binding).  Thanks, Rajini
>
> On Tue, Aug 29, 2017 at 2:10 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>
>> Thanks for the KIP, +1 (binding) from me.
>>
>> Ismael
>>
>> On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram <rajinisiva...@gmail.com>
>> wrote:
>>
>> > Hi all,
>> >
>> > I would like to start vote on KIP-152 to improve diagnostics of
>> > authentication failures and to update clients to treat authentication
>> > failures as fatal exceptions rather than transient errors:
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 152+-+Improve+diagnostics+for+SASL+authentication+failures
>> >
>> > Thank you...
>> >
>> > Rajini
>> >
>>
>
>


Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-29 Thread Roger Hoover
> > > > > Ismael/Manikumar,
> > > > > >
> > > > > > Agree that version as a String attribute makes more sense.
> > > > Unfortunately,
> > > > > > the whole KafkaMetric implementation is written around a single
> > > > "double"
> > > > > > type, so introducing a new type is a big change. But I suppose it
> > can
> > > > be
> > > > > > done. I have updated the KIP.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 18, 2017 at 7:42 AM, Manikumar <
> > > manikumar.re...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I agree it will be good if we can add  "commit id/version" as
> an
> > > > > > > attribute value.
> > > > > > > It will be easy to parse. But as of now, KafkaMetric supports
> > only
> > > > > > > numerical values.
> > > > > > >
> > > > > > > On Fri, Aug 18, 2017 at 5:49 AM, Ismael Juma <
> ism...@juma.me.uk>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Rajini,
> > > > > > > >
> > > > > > > > About the gauges, I was thinking that the attribute would be
> > the
> > > > > value
> > > > > > > > (i.e. commit id or version). I understand that Kafka Metrics
> > > > doesn't
> > > > > > > > support this (unlike Yammer Metrics), but would it make sense
> > to
> > > > add?
> > > > > > > >
> > > > > > > > Ismael
> > > > > > > >
> > > > > > > > On Thu, Aug 17, 2017 at 2:54 PM, Rajini Sivaram <
> > > > > > rajinisiva...@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Ismael,
> > > > > > > > >
> > > > > > > > > Thank you for the review.
> > > > > > > > >
> > > > > > > > > 1. Agree on keeping it simple with dynamic registration and
> > no
> > > > > > expiry.
> > > > > > > > Will
> > > > > > > > > wait for Jun's feedback before updating KIP.
> > > > > > > > > 2. I have switched to two metrics for commit-id and version
> > > (not
> > > > > sure
> > > > > > > if
> > > > > > > > it
> > > > > > > > > matches what you meant). I also added the client-id tag
> which
> > > is
> > > > > used
> > > > > > > in
> > > > > > > > > all metrics from clients.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > Rajini
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Aug 17, 2017 at 10:55 AM, Ismael Juma <
> > > ism...@juma.me.uk
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the KIP, Rajini. I think this is helpful too.
> A
> > > few
> > > > > > minor
> > > > > > > > > > comments.
> > > > > > > > > >
> > > > > > > > > > 1. About the number of metrics and expiration, if we
> > > > dynamically
> > > > > > > > register
> > > > > > > > > > metrics for the error codes, the number is likely to be
> > much
> > > > > lower
> > > > > > > than
> > > > > > > > > > 30*30, probably less than 100. If we were using Kafka
> > Metrics
> > > > for
> > > > > > > this,
> > > > > > > > > we
> > > > > > > > > > could easily add a long expiration period to be
> > conservative,
> > > > > but I
> > > > > > > am
> > > > 

Re: [VOTE] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-29 Thread Roger Hoover
+1 (non-binding).  Thanks, Rajini

On Tue, Aug 29, 2017 at 2:10 AM, Ismael Juma  wrote:

> Thanks for the KIP, +1 (binding) from me.
>
> Ismael
>
> On Thu, Aug 24, 2017 at 6:29 PM, Rajini Sivaram 
> wrote:
>
> > Hi all,
> >
> > I would like to start vote on KIP-152 to improve diagnostics of
> > authentication failures and to update clients to treat authentication
> > failures as fatal exceptions rather than transient errors:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 152+-+Improve+diagnostics+for+SASL+authentication+failures
> >
> > Thank you...
> >
> > Rajini
> >
>


Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-17 Thread Roger Hoover
Rajini,

The table is super helpful.  Thank you.

On Thu, Aug 17, 2017 at 2:16 AM, Rajini Sivaram <rajinisiva...@gmail.com>
wrote:

> Hi Roger,
>
> Thank you for the review. I have added a table with the scope of errors
> counted for each request.
>
> Regards,
>
> Rajini
>
> On Thu, Aug 17, 2017 at 12:05 AM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > I think it would useful to make clear somewhere for each metric, the
> level
> > at which it's counted.  I don't know all the details of the Kafka
> protocol
> > but it might be something like
> >
> > ProduceRequest, Fetch Request - counted at per-partition level
> > All other requests are 1:1 with client requests?
> >
> > Cheers,
> >
> > Roger
> >
> > On Wed, Aug 16, 2017 at 4:02 PM, Roger Hoover <roger.hoo...@gmail.com>
> > wrote:
> >
> > > Rajini,
> > >
> > > Thank you for the KIP.  These are very helpful additions.  One question
> > on
> > > the error code metrics:
> > >
> > > Will the total error counting happen at the the level of topic
> partition?
> > > For example, if a single ProduceRequest contains messages to append to
> 3
> > > partitions and say all 3 appends are successful, the counter
> > > for kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=
> > ProduceRequest,error=0
> > > will be incremented by 3?
> > >
> > > Thanks,
> > >
> > > Roger
> > >
> > > On Wed, Aug 16, 2017 at 12:10 PM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > wrote:
> > >
> > >> I have created a KIP to add some additional metrics to support health
> > >> checks:
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-188+-+
> > >> Add+new+metrics+to+support+health+checks
> > >>
> > >> Feedback and suggestions are welcome.
> > >>
> > >> Regards,
> > >>
> > >> Rajini
> > >>
> > >
> > >
> >
>


Re: [DISCUSS] KIP-187 - Add cumulative count metric for all Kafka rate metrics

2017-08-16 Thread Roger Hoover
Rajini,

Thank you.  This is very useful.  Grouping by metric by prefixing the name
instead of making them MBeans is not quite as nice but seems like an good
compromise for backward compatibility.

Cheers,

Roger

On Wed, Aug 16, 2017 at 5:35 AM, Rajini Sivaram 
wrote:

> Sorry, pressed send by mistake.
>
> The KIP is here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 187+-+Add+cumulative+count+metric+for+all+Kafka+rate+metrics
>
> Feedback and suggestions are appreciated.
>
> Thank you...
>
> Rajini
>
>
> On Wed, Aug 16, 2017 at 1:34 PM, Rajini Sivaram 
> wrote:
>
> > Hi all,
> >
> > I have posted a KIP to add a cumulative count attribute for all Kafka
> rate
> > metrics to make downstream processing simpler, more accurate and
> flexible:
> >
> >
>


Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-16 Thread Roger Hoover
I think it would useful to make clear somewhere for each metric, the level
at which it's counted.  I don't know all the details of the Kafka protocol
but it might be something like

ProduceRequest, Fetch Request - counted at per-partition level
All other requests are 1:1 with client requests?

Cheers,

Roger

On Wed, Aug 16, 2017 at 4:02 PM, Roger Hoover <roger.hoo...@gmail.com>
wrote:

> Rajini,
>
> Thank you for the KIP.  These are very helpful additions.  One question on
> the error code metrics:
>
> Will the total error counting happen at the the level of topic partition?
> For example, if a single ProduceRequest contains messages to append to 3
> partitions and say all 3 appends are successful, the counter
> for 
> kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=ProduceRequest,error=0
> will be incremented by 3?
>
> Thanks,
>
> Roger
>
> On Wed, Aug 16, 2017 at 12:10 PM, Rajini Sivaram <rajinisiva...@gmail.com>
> wrote:
>
>> I have created a KIP to add some additional metrics to support health
>> checks:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-188+-+
>> Add+new+metrics+to+support+health+checks
>>
>> Feedback and suggestions are welcome.
>>
>> Regards,
>>
>> Rajini
>>
>
>


Re: [DISCUSS] KIP-188 - Add new metrics to support health checks

2017-08-16 Thread Roger Hoover
Rajini,

Thank you for the KIP.  These are very helpful additions.  One question on
the error code metrics:

Will the total error counting happen at the the level of topic partition?
For example, if a single ProduceRequest contains messages to append to 3
partitions and say all 3 appends are successful, the counter
for 
kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=ProduceRequest,error=0
will be incremented by 3?

Thanks,

Roger

On Wed, Aug 16, 2017 at 12:10 PM, Rajini Sivaram 
wrote:

> I have created a KIP to add some additional metrics to support health
> checks:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 188+-+Add+new+metrics+to+support+health+checks
>
> Feedback and suggestions are welcome.
>
> Regards,
>
> Rajini
>


Re: [ANNOUNCE] New Kafka PMC member Ismael Juma

2017-07-05 Thread Roger Hoover
Well deserved, indeed!  Congrats, Ismael.

On Wed, Jul 5, 2017 at 3:24 PM, Damian Guy  wrote:

> Congratulations Ismael! Very well deserved.
> Cheers,
> Damian
> On Wed, 5 Jul 2017 at 22:54, Dong Lin  wrote:
>
> > Congratulations Ismael!
> >
> > On Wed, Jul 5, 2017 at 1:55 PM, Jun Rao  wrote:
> >
> > > Hi, Everyone,
> > >
> > > Ismael Juma has been active in the Kafka community since he became
> > > a Kafka committer about a year ago. I am glad to announce that Ismael
> is
> > > now a member of Kafka PMC.
> > >
> > > Congratulations, Ismael!
> > >
> > > Jun
> > >
> >
>


Re: [VOTE] KIP-153 (separating replication traffic from BytesOutPerSec metric)

2017-05-08 Thread Roger Hoover
+1

Sent from my iPhone

> On May 8, 2017, at 5:00 AM, Edoardo Comar  wrote:
> 
> +1 
> Many thanks Jun
> --
> Edoardo Comar
> IBM MessageHub
> eco...@uk.ibm.com
> IBM UK Ltd, Hursley Park, SO21 2JN
> 
> IBM United Kingdom Limited Registered in England and Wales with number 
> 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 
> 3AU
> 
> 
> 
> From:   Jun Rao 
> To: "dev@kafka.apache.org" 
> Date:   07/05/2017 22:40
> Subject:[VOTE] KIP-153 (separating replication traffic from 
> BytesOutPerSec metric)
> 
> 
> 
> Hi, Everyone,
> 
> Since this is a relatively simple change, I would like to start the voting
> process for KIP-153 : Include only client traffic in BytesOutPerSec 
> metric.
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-153+%
> 3A+Include+only+client+traffic+in+BytesOutPerSec+metric
> 
> The vote will run for a minimum of 72 hours.
> 
> Thanks,
> 
> Jun
> 
> 
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [DISCUSS] KIP-153 : Include only client traffic in BytesOutPerSec metric

2017-05-05 Thread Roger Hoover
Very helpful.  Thank you, Jun.

On Fri, May 5, 2017 at 4:42 PM, Guozhang Wang  wrote:

> Jun,
>
> Thanks for the KIP, LGTM.
>
> Guozhang
>
> On Fri, May 5, 2017 at 3:38 PM, Ismael Juma  wrote:
>
> > Thanks Jun, looks good to me.
> >
> > Ismael
> >
> > On Fri, May 5, 2017 at 11:35 PM, Jun Rao  wrote:
> >
> > > Hi, Ismael,
> > >
> > > Good point. Updated the KIP with ReplicationBytesInPerSec.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, May 5, 2017 at 3:16 PM, Ismael Juma  wrote:
> > >
> > > > Thanks for the KIP, Jun. Good to fix this inconsistency. Do I
> > understand
> > > > correctly that we are introducing ReplicationBytesOutPerSec, but not
> > > > ReplicationBytesInPerSec?
> > > > If so, what's the reason?
> > > >
> > > > Ismael
> > > >
> > > > On Fri, May 5, 2017 at 11:11 PM, Jun Rao  wrote:
> > > >
> > > > > Hi, Everyone,
> > > > >
> > > > > We created "KIP-153 : Include only client traffic in BytesOutPerSec
> > > > > metric".
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 153+%3A+Include+only+client+traffic+in+BytesOutPerSec+metric
> > > > >
> > > > > Please take a look and provide your feedback.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>


[jira] [Commented] (KAFKA-3795) Transient system test failure upgrade_test.TestUpgrade

2017-04-03 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953759#comment-15953759
 ] 

Roger Hoover commented on KAFKA-3795:
-

Happened again: 
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2017-04-03--001.1491220440--apache--trunk--bdf4cba/

{noformat}
test_id:
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.9.0.1.to_message_format_version=None.security_protocol=SASL_SSL.compression_types=.none
status: FAIL
run time:   4 minutes 4.673 seconds


199680 acked message did not make it to the Consumer. They are: 538129, 
538132, 538135, 538138, 538140, 538141, 538143, 538144, 538146, 538147, 538149, 
538150, 538152, 538153, 538155, 538156, 538158, 538159, 538161, 538162...plus 
199660 more. Total Acked: 331954, Total Consumed: 138002. The first 1000 
missing messages were validated to ensure they are in Kafka's data files. 1000 
were missing. This suggests data loss. Here are some of the messages not found 
in the data files: [538624, 538625, 538626, 538627, 538628, 538629, 538630, 
538631, 538632, 538633]

Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/upgrade_test.py",
 line 125, in test_upgrade
self.run_produce_consume_validate(core_test_action=lambda: 
self.perform_upgrade(from_kafka_version,
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 118, in run_produce_consume_validate
self.validate()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 188, in validate
assert success, msg
AssertionError: 199680 acked message did not make it to the Consumer. They are: 
538129, 538132, 538135, 538138, 538140, 538141, 538143, 538144, 538146, 538147, 
538149, 538150, 538152, 538153, 538155, 538156, 538158, 538159, 538161, 
538162...plus 199660 more. Total Acked: 331954, Total Consumed: 138002. The 
first 1000 missing messages were validated to ensure they are in Kafka's data 
files. 1000 were missing. This suggests data loss. Here are some of the 
messages not found in the data files: [538624, 538625, 538626, 538627, 538628, 
538629, 538630, 538631, 538632, 538633]
{noformat}

> Transient system test failure upgrade_test.TestUpgrade
> --
>
> Key: KAFKA-3795
> URL: https://issues.apache.org/jira/browse/KAFKA-3795
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Jason Gustafson
>  Labels: reliability
>
> From a recent build running on the 0.10.0 branch:
> {code}
> test_id:
> 2016-06-06--001.kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.9.0.1.to_message_format_version=0.9.0.1.compression_types=.snappy.new_consumer=True
> status: FAIL
> run time:   3 minutes 29.166 seconds
> 3522 acked message did not make it to the Consumer. They are: 476524, 
> 476525, 476527, 476528, 476530, 476531, 476533, 476534, 476536, 476537, 
> 476539, 476540, 476542, 476543, 476545, 476546, 476548, 476549, 476551, 
> 476552, ...plus 3482 more. Total Acked: 110437, Total Consumed: 127470. The 
> first 1000 missing messages were validated to ensure they are in Kafka's data 
> files. 1000 were missing. This suggests data loss. Here are some of the 
> messages not found in the data files: [477184, 477185, 477187, 477188, 
> 477190, 477191, 477193, 477194, 477196, 477197]
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py"

Re: [DISCUSS] KIP-136: Add Listener name and Security Protocol name to SelectorMetrics tags

2017-03-30 Thread Roger Hoover
Edo,

Thanks for the proposal.  This looks great to me.

Cheers,

Roger

On Thu, Mar 30, 2017 at 8:51 AM, Edoardo Comar  wrote:

> Hi all,
>
> We created KIP-136: Add Listener name and Security Protocol name to
> SelectorMetrics tags
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 136%3A+Add+Listener+name+and+Security+Protocol+name+to+
> SelectorMetrics+tags
>
> Please help review the KIP. You feedback is appreciated!
>
> cheers,
> Edo
> --
> Edoardo Comar
> IBM MessageHub
> eco...@uk.ibm.com
> IBM UK Ltd, Hursley Park, SO21 2JN
>
> IBM United Kingdom Limited Registered in England and Wales with number
> 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6
> 3AU
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


[jira] [Updated] (KAFKA-4755) SimpleBenchmark consume test fails for streams

2017-03-28 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-4755:

Priority: Blocker  (was: Major)

> SimpleBenchmark consume test fails for streams
> --
>
> Key: KAFKA-4755
> URL: https://issues.apache.org/jira/browse/KAFKA-4755
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
>Priority: Blocker
> Fix For: 0.11.0.0
>
>
> This occurred Feb 10th 2017:
> kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=consume.scale=1
> status: FAIL
> run time:   7 minutes 36.712 seconds
> Streams Test process on ubuntu@worker2 took too long to exit
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py",
>  line 86, in test_simple_benchmark
> self.driver[num].wait()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 102, in wait
> self.wait_node(node, timeout_sec)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 106, in wait_node
> wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, 
> err_msg="Streams Test process on " + str(node.account) + " took too long to 
> exit")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Streams Test process on ubuntu@worker2 took too long to exit



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (KAFKA-4755) SimpleBenchmark consume test fails for streams

2017-03-28 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover reopened KAFKA-4755:
-

Happened again.  Logs here:  
http://confluent-kafka-0-10-2-system-test-results.s3-us-west-2.amazonaws.com/2017-03-28--001.1490697484--apache--0.10.2--1e4cab7/StreamsSimpleBenchmarkTest/test_simple_benchmark/212.tgz




Streams Test process on ubuntu@worker2 took too long to exit
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py",
 line 48, in test_simple_benchmark
self.driver.wait()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/services/streams.py",
 line 99, in wait
self.wait_node(node, timeout_sec)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/services/streams.py",
 line 103, in wait_node
wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, 
err_msg="Streams Test process on " + str(node.account) + " took too long to 
exit")
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
raise TimeoutError(err_msg)
TimeoutError: Streams Test process on ubuntu@worker2 took too long to exit



> SimpleBenchmark consume test fails for streams
> --
>
> Key: KAFKA-4755
> URL: https://issues.apache.org/jira/browse/KAFKA-4755
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
> Fix For: 0.11.0.0
>
>
> This occurred Feb 10th 2017:
> kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=consume.scale=1
> status: FAIL
> run time:   7 minutes 36.712 seconds
> Streams Test process on ubuntu@worker2 took too long to exit
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py",
>  line 86, in test_simple_benchmark
> self.driver[num].wait()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 102, in wait
> self.wait_node(node, timeout_sec)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 106, in wait_node
> wait_until(lambda: not node.account.alive(pid), timeout_sec=timeout_sec, 
> err_msg="Streams Test process on " + str(node.account) + " took too long to 
> exit")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Streams Test process on ubuntu@worker2 took too long to exit



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4689) OffsetValidationTest fails validation with "Current position greater than the total number of consumed records"

2017-03-28 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945543#comment-15945543
 ] 

Roger Hoover commented on KAFKA-4689:
-

Happened again. Logs here 
http://confluent-kafka-0-10-2-system-test-results.s3-us-west-2.amazonaws.com/2017-03-28--001.1490697484--apache--0.10.2--1e4cab7/OffsetValidationTest/test_consumer_bounce/clean_shutdown%3DFalse.bounce_mode%3Drolling/49.tgz


test_id:
kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=rolling
status: FAIL
run time:   3 minutes 32.756 seconds


Current position 79302 greater than the total number of consumed records 
79299
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/client/consumer_test.py",
 line 159, in test_consumer_bounce
(consumer.current_position(partition), consumer.total_consumed())
AssertionError: Current position 79302 greater than the total number of 
consumed records 79299


> OffsetValidationTest fails validation with "Current position greater than the 
> total number of consumed records"
> ---
>
> Key: KAFKA-4689
> URL: https://issues.apache.org/jira/browse/KAFKA-4689
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Reporter: Ewen Cheslack-Postava
>Assignee: Jason Gustafson
>  Labels: system-test-failure
>
> {quote}
> 
> test_id:
> kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=all
> status: FAIL
> run time:   1 minute 49.834 seconds
> Current position greater than the total number of consumed records
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/client/consumer_test.py",
>  line 157, in test_consumer_bounce
> "Current position greater than the total number of consumed records"
> AssertionError: Current position greater than the total number of consumed 
> records
> {quote}
> See also 
> https://issues.apache.org/jira/browse/KAFKA-3513?focusedCommentId=15791790=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15791790
>  which is another instance of this bug, which indicates the issue goes back 
> at least as far as 1/17/2017. Note that I don't think we've seen this in 
> 0.10.1 yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-124: Request rate quotas

2017-03-21 Thread Roger Hoover
Rajini,

This is great.  Thank you.  +1 (non-binding)

Roger

On Tue, Mar 21, 2017 at 8:55 AM, Ismael Juma  wrote:

> Rajini,
>
> Thanks for the proposal and for addressing the (sometimes contradictory)
> feedback. :) +1 (binding) from me.
>
> Ismael
>
> On Mon, Mar 20, 2017 at 2:47 PM, Rajini Sivaram 
> wrote:
>
> > I would like to initiate the voting process for KIP-124:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 124+-+Request+rate+quotas
> >
> > The KIP proposes to add request processing time quota to limit CPU
> > utilization at **, ** or ** levels
> > similar to the current bandwidth limiting quotas. The quota takes into
> > account the total time spent by the user/client on request handler and
> > network threads within a quota window. Quota will be configured as a
> > per-thread percentage value, which typically indicates the percentage of
> > CPU cores allocated to the user/client.
> >
> > The discussion thread is here:
> >
> >
> > https://lists.apache.org/thread.html/46c7bbc8f381ebe718b3cce6ed8bdf
> > 3745df22b0bd88020d70c99813@%3Cdev.kafka.apache.org%3E
> >
> > Many thanks to everyone for the feedback and suggestions so far.
> >
> > Regards,
> >
> > Rajini
> >
>


Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-22 Thread Roger Hoover
Great to see this KIP and the excellent discussion.

To me, Jun's suggestion makes sense.  If my application is allocated 1
request handler unit, then it's as if I have a Kafka broker with a single
request handler thread dedicated to me.  That's the most I can use, at
least.  That allocation doesn't change even if an admin later increases the
size of the request thread pool on the broker.  It's similar to the CPU
abstraction that VMs and containers get from hypervisors or OS schedulers.
While different client access patterns can use wildly different amounts of
request thread resources per request, a given application will generally
have a stable access pattern and can figure out empirically how many
"request thread units" it needs to meet it's throughput/latency goals.

Cheers,

Roger

On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao  wrote:

> Hi, Rajini,
>
> Thanks for the updated KIP. A few more comments.
>
> 1. A concern of request_time_percent is that it's not an absolute value.
> Let's say you give a user a 10% limit. If the admin doubles the number of
> request handler threads, that user now actually has twice the absolute
> capacity. This may confuse people a bit. So, perhaps setting the quota
> based on an absolute request thread unit is better.
>
> 2. ControlledShutdownRequest is also an inter-broker request and needs to
> be excluded from throttling.
>
> 3. Implementation wise, I am wondering if it's simpler to apply the request
> time throttling first in KafkaApis.handle(). Otherwise, we will need to add
> the throttling logic in each type of request.
>
> Thanks,
>
> Jun
>
> On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram 
> wrote:
>
> > Jun,
> >
> > Thank you for the review.
> >
> > I have reverted to the original KIP that throttles based on request
> handler
> > utilization. At the moment, it uses percentage, but I am happy to change
> to
> > a fraction (out of 1 instead of 100) if required. I have added the
> examples
> > from this discussion to the KIP. Also added a "Future Work" section to
> > address network thread utilization. The configuration is named
> > "request_time_percent" with the expectation that it can also be used as
> the
> > limit for network thread utilization when that is implemented, so that
> > users have to set only one config for the two and not have to worry about
> > the internal distribution of the work between the two thread pools in
> > Kafka.
> >
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao  wrote:
> >
> > > Hi, Rajini,
> > >
> > > Thanks for the proposal.
> > >
> > > The benefit of using the request processing time over the request rate
> is
> > > exactly what people have said. I will just expand that a bit. Consider
> > the
> > > following case. The producer sends a produce request with a 10MB
> message
> > > but compressed to 100KB with gzip. The decompression of the message on
> > the
> > > broker could take 10-15 seconds, during which time, a request handler
> > > thread is completely blocked. In this case, neither the byte-in quota
> nor
> > > the request rate quota may be effective in protecting the broker.
> > Consider
> > > another case. A consumer group starts with 10 instances and later on
> > > switches to 20 instances. The request rate will likely double, but the
> > > actually load on the broker may not double since each fetch request
> only
> > > contains half of the partitions. Request rate quota may not be easy to
> > > configure in this case.
> > >
> > > What we really want is to be able to prevent a client from using too
> much
> > > of the server side resources. In this particular KIP, this resource is
> > the
> > > capacity of the request handler threads. I agree that it may not be
> > > intuitive for the users to determine how to set the right limit.
> However,
> > > this is not completely new and has been done in the container world
> > > already. For example, Linux cgroup (https://access.redhat.com/
> > > documentation/en-US/Red_Hat_Enterprise_Linux/6/html/
> > > Resource_Management_Guide/sec-cpu.html) has the concept of
> > > cpu.cfs_quota_us,
> > > which specifies the total amount of time in microseconds for which all
> > > tasks in a cgroup can run during a one second period. We can
> potentially
> > > model the request handler threads in a similar way. For example, each
> > > request handler thread can be 1 request handler unit and the admin can
> > > configure a limit on how many units (say 0.01) a client can have.
> > >
> > > Regarding not throttling the internal broker to broker requests. We
> could
> > > do that. Alternatively, we could just let the admin configure a high
> > limit
> > > for the kafka user (it may not be able to do that easily based on
> > clientId
> > > though).
> > >
> > > Ideally we want to be able to protect the utilization of the network
> > thread
> > > pool too. The difficult is mostly what Rajini said: (1) The 

Re: [DISCUSS] KIP-84: Support SASL/SCRAM mechanisms

2017-02-15 Thread Roger Hoover
Yes.  Thank you, Ismael.

On Wed, Feb 8, 2017 at 2:30 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Roger,
>
> Sorry for the delay. SCRAM is specified by:
>
> https://tools.ietf.org/html/rfc5802
>
> The following quote is relevant:
>
> A SCRAM mechanism name is a string "SCRAM-" followed by the
> > uppercased name of the underlying hash function taken from the IANA
> > "Hash Function Textual Names" registry (see http://www.iana.org),
> > optionally followed by the suffix "-PLUS" (see below)
>
>
> And:
>
> "md2" 1.2.840.113549.2.2 [RFC3279]
> > "md5" 1.2.840.113549.2.5 [RFC3279]
> > "sha-1" 1.3.14.3.2.26 [RFC3279]
> > "sha-224" 2.16.840.1.101.3.4.2.4 [RFC4055]
> > "sha-256" 2.16.840.1.101.3.4.2.1 [RFC4055]
> > "sha-384" 2.16.840.1.101.3.4.2.2 [RFC4055]
> > "sha-512" 2.16.840.1.101.3.4.2.3 [RFC4055]
>
>
> https://www.iana.org/assignments/hash-function-
> text-names/hash-function-text-names.xhtml
>
> As you see, bcrypt is not an option for the current spec. The naming of the
> mechanisms would be a bit misleading if support for bcrypt was added
> (SCRAM-PKBDF2-SHA512, SCRAM-BCRYPT*, etc. would be better).
>
> Does that make sense?
>
> Ismael
>
> On Tue, Jan 24, 2017 at 7:26 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Thanks, Ismael.  Just curious, why does it not make sense to do bcrypt
> > it in the context of SCRAM?
> >
> > On Mon, Jan 23, 2017 at 3:54 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Hi Roger,
> > >
> > > SCRAM uses the PBKDF2 mechanism, here's a comparison between PBKDF2 and
> > > bcrypt:
> > >
> > > http://security.stackexchange.com/questions/4781/do-any-secu
> > > rity-experts-recommend-bcrypt-for-password-storage/6415#6415
> > >
> > > It may be worth supporting bcrypt, but not sure it would make sense to
> do
> > > it in the context of SCRAM.
> > >
> > > A minor correction: the KIP includes SCRAM-SHA-256 and SCRAM-SHA-512
> (not
> > > SCRAM-SHA-1).
> > >
> > > Ismael
> > >
> > > On Mon, Jan 23, 2017 at 10:49 PM, Roger Hoover <roger.hoo...@gmail.com
> >
> > > wrote:
> > >
> > > > Sorry for the late question but is there a reason to choose SHA-1 and
> > > > SHA-256 instead of bcrypt?
> > > >
> > > > https://codahale.com/how-to-safely-store-a-password/
> > > >
> > > > On Fri, Nov 11, 2016 at 5:30 AM, Rajini Sivaram <
> > > > rajinisiva...@googlemail.com> wrote:
> > > >
> > > > > I think all the comments and suggestions on this thread have now
> been
> > > > > incorporated into the KIP. If there are no objections, I will start
> > the
> > > > > voting process on Monday.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > > > On Tue, Nov 8, 2016 at 9:20 PM, Rajini Sivaram <
> > > > > rajinisiva...@googlemail.com
> > > > > > wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > Have added a sub-section on delegation token support to the KIP.
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > > On Tue, Nov 8, 2016 at 8:07 PM, Jun Rao <j...@confluent.io>
> wrote:
> > > > > >
> > > > > >> Hi, Rajini,
> > > > > >>
> > > > > >> That makes sense. Could you document this potential future
> > extension
> > > > in
> > > > > >> the
> > > > > >> KIP?
> > > > > >>
> > > > > >> Jun
> > > > > >>
> > > > > >> On Tue, Nov 8, 2016 at 11:17 AM, Rajini Sivaram <
> > > > > >> rajinisiva...@googlemail.com> wrote:
> > > > > >>
> > > > > >> > Jun,
> > > > > >> >
> > > > > >> > 11. SCRAM messages have an optional extensions field which is
> a
> > > list
> > > > > of
> > > > > >> > key=value pairs. We can add an extension key to the first
> client
> > > > > >> message to
> > > > > 

Re: [VOTE] KIP-48 Support for delegation tokens as an authentication mechanism

2017-02-08 Thread Roger Hoover
Thanks.  I found the other discussion thread.  Sorry for being behind on
this.

I'm interested in the future impersonation use cases.  This seems to get us
closer.

+1 (non-binding)

On Wed, Feb 8, 2017 at 4:41 AM, Manikumar <manikumar.re...@gmail.com> wrote:

> Hi Roger,
>
> In the current proposal, we only allow a user to get delegation token for
> that user only.
> Anyone who gets that token can impersonate the user on the broker.
>
> Yes, In future we can extend the support to allow a user to acquire
> delegation tokens for
> other users.
>
> Pl refer discuss mail thread for impersonation related discussion.
>
> Thanks,
> Manikumar
>
> On Wed, Feb 8, 2017 at 8:37 AM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Hi Jun,
> >
> > How does it allow impersonation at the connection level?  Looking at the
> > KIP, the DelegationTokenRequest does not have an "Owner" field that can
> be
> > set.   The owner field of the DelegationTokenResponse says it's the
> "Kakfa
> > Principal which requested the delegation token".  For impersonation,
> don't
> > we need to be able to get tokens for other users besides the one making
> the
> > request?
> >
> > Thanks,
> >
> > Roger
> >
> > On Tue, Feb 7, 2017 at 6:45 PM, Jun Rao <j...@confluent.io> wrote:
> >
> > > Hi, Roger,
> > >
> > > Just to clarify. This KIP already allows you to do impersonation at the
> > > connection level. Are you talking about impersonation at the request
> > level?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Feb 7, 2017 at 5:53 PM, Roger Hoover <roger.hoo...@gmail.com>
> > > wrote:
> > >
> > > > Just wondering...how difficult would be it be to later add
> > impersonation
> > > (
> > > > https://issues.apache.org/jira/browse/KAFKA-3712)?  One use case
> would
> > > be
> > > > a
> > > > Kafka admin UI that would take action on the cluster on behalf
> > different
> > > > users.I suppose we could later add an "effectiveUserId" (in Unix
> > > > terminology) to the token details?
> > > >
> > > > On Tue, Feb 7, 2017 at 5:25 PM, Grant Henke <ghe...@cloudera.com>
> > wrote:
> > > >
> > > > > +1 from me as well.
> > > > >
> > > > > On Tue, Feb 7, 2017 at 7:10 PM, Jason Gustafson <
> ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Looks like a great proposal! I noticed that key rotation is not
> > > > included.
> > > > > > That may be reasonable for the initial work, but it might be nice
> > to
> > > > > share
> > > > > > some thoughts on how that might work in the future. For example,
> I
> > > > could
> > > > > > imagine delegation.token.master.key could be a list, which would
> > > allow
> > > > > > users to support both a new and old key at the same time while
> > > clients
> > > > > are
> > > > > > upgrading keys.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Tue, Feb 7, 2017 at 4:42 PM, Gwen Shapira <g...@confluent.io>
> > > > wrote:
> > > > > >
> > > > > > > Read the KIP again and I think it looks good.
> > > > > > >
> > > > > > > +1 from me.
> > > > > > >
> > > > > > > On Tue, Feb 7, 2017 at 3:05 PM, Jun Rao <j...@confluent.io>
> > wrote:
> > > > > > > > Hi, Mani,
> > > > > > > >
> > > > > > > > If a token expires, then every broker will potentially try to
> > > > delete
> > > > > it
> > > > > > > > around the same time, but only one will succeed. So, we will
> > have
> > > > to
> > > > > > deal
> > > > > > > > with failures in that case? Another way is to let just one
> > broker
> > > > > (say,
> > > > > > > the
> > > > > > > > controller) deletes expired tokens.
> > > > > > > >
> > > > > > > > It would also be helpful for others to give feedback on this
> > KIP.
> > > > > > Rajini,
> > 

Re: [VOTE] KIP-48 Support for delegation tokens as an authentication mechanism

2017-02-07 Thread Roger Hoover
Hi Jun,

How does it allow impersonation at the connection level?  Looking at the
KIP, the DelegationTokenRequest does not have an "Owner" field that can be
set.   The owner field of the DelegationTokenResponse says it's the "Kakfa
Principal which requested the delegation token".  For impersonation, don't
we need to be able to get tokens for other users besides the one making the
request?

Thanks,

Roger

On Tue, Feb 7, 2017 at 6:45 PM, Jun Rao <j...@confluent.io> wrote:

> Hi, Roger,
>
> Just to clarify. This KIP already allows you to do impersonation at the
> connection level. Are you talking about impersonation at the request level?
>
> Thanks,
>
> Jun
>
> On Tue, Feb 7, 2017 at 5:53 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Just wondering...how difficult would be it be to later add impersonation
> (
> > https://issues.apache.org/jira/browse/KAFKA-3712)?  One use case would
> be
> > a
> > Kafka admin UI that would take action on the cluster on behalf different
> > users.I suppose we could later add an "effectiveUserId" (in Unix
> > terminology) to the token details?
> >
> > On Tue, Feb 7, 2017 at 5:25 PM, Grant Henke <ghe...@cloudera.com> wrote:
> >
> > > +1 from me as well.
> > >
> > > On Tue, Feb 7, 2017 at 7:10 PM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Looks like a great proposal! I noticed that key rotation is not
> > included.
> > > > That may be reasonable for the initial work, but it might be nice to
> > > share
> > > > some thoughts on how that might work in the future. For example, I
> > could
> > > > imagine delegation.token.master.key could be a list, which would
> allow
> > > > users to support both a new and old key at the same time while
> clients
> > > are
> > > > upgrading keys.
> > > >
> > > > -Jason
> > > >
> > > > On Tue, Feb 7, 2017 at 4:42 PM, Gwen Shapira <g...@confluent.io>
> > wrote:
> > > >
> > > > > Read the KIP again and I think it looks good.
> > > > >
> > > > > +1 from me.
> > > > >
> > > > > On Tue, Feb 7, 2017 at 3:05 PM, Jun Rao <j...@confluent.io> wrote:
> > > > > > Hi, Mani,
> > > > > >
> > > > > > If a token expires, then every broker will potentially try to
> > delete
> > > it
> > > > > > around the same time, but only one will succeed. So, we will have
> > to
> > > > deal
> > > > > > with failures in that case? Another way is to let just one broker
> > > (say,
> > > > > the
> > > > > > controller) deletes expired tokens.
> > > > > >
> > > > > > It would also be helpful for others to give feedback on this KIP.
> > > > Rajini,
> > > > > > Gwen, Ismael?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Sun, Feb 5, 2017 at 9:54 AM, Manikumar <
> > manikumar.re...@gmail.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > >> Hi Jun,
> > > > > >>
> > > > > >>  Please see the replies inline.
> > > > > >>
> > > > > >>
> > > > > >> > >
> > > > > >> > > Only one broker does the deletion. Broker updates the
> > expiration
> > > > in
> > > > > its
> > > > > >> > > local cache
> > > > > >> > > and on zookeeper so other brokers also get notified and
> their
> > > > cache
> > > > > >> > > statuses are updated as well.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > Which broker does the deletion?
> > > > > >> >
> > > > > >>
> > > > > >> Any broker can handle the create/expire/renew/describe
> > > delegationtoken
> > > > > >> requests.
> > > > > >> changes are propagated through zk notifications.  Every broker
> is
> > > > > >> responsible for
> > > > > >> expiring the tokens. This check be can done during requ

Re: [VOTE] KIP-48 Support for delegation tokens as an authentication mechanism

2017-02-07 Thread Roger Hoover
Just wondering...how difficult would be it be to later add impersonation (
https://issues.apache.org/jira/browse/KAFKA-3712)?  One use case would be a
Kafka admin UI that would take action on the cluster on behalf different
users.I suppose we could later add an "effectiveUserId" (in Unix
terminology) to the token details?

On Tue, Feb 7, 2017 at 5:25 PM, Grant Henke  wrote:

> +1 from me as well.
>
> On Tue, Feb 7, 2017 at 7:10 PM, Jason Gustafson 
> wrote:
>
> > Looks like a great proposal! I noticed that key rotation is not included.
> > That may be reasonable for the initial work, but it might be nice to
> share
> > some thoughts on how that might work in the future. For example, I could
> > imagine delegation.token.master.key could be a list, which would allow
> > users to support both a new and old key at the same time while clients
> are
> > upgrading keys.
> >
> > -Jason
> >
> > On Tue, Feb 7, 2017 at 4:42 PM, Gwen Shapira  wrote:
> >
> > > Read the KIP again and I think it looks good.
> > >
> > > +1 from me.
> > >
> > > On Tue, Feb 7, 2017 at 3:05 PM, Jun Rao  wrote:
> > > > Hi, Mani,
> > > >
> > > > If a token expires, then every broker will potentially try to delete
> it
> > > > around the same time, but only one will succeed. So, we will have to
> > deal
> > > > with failures in that case? Another way is to let just one broker
> (say,
> > > the
> > > > controller) deletes expired tokens.
> > > >
> > > > It would also be helpful for others to give feedback on this KIP.
> > Rajini,
> > > > Gwen, Ismael?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Sun, Feb 5, 2017 at 9:54 AM, Manikumar  >
> > > wrote:
> > > >
> > > >> Hi Jun,
> > > >>
> > > >>  Please see the replies inline.
> > > >>
> > > >>
> > > >> > >
> > > >> > > Only one broker does the deletion. Broker updates the expiration
> > in
> > > its
> > > >> > > local cache
> > > >> > > and on zookeeper so other brokers also get notified and their
> > cache
> > > >> > > statuses are updated as well.
> > > >> > >
> > > >> > >
> > > >> > Which broker does the deletion?
> > > >> >
> > > >>
> > > >> Any broker can handle the create/expire/renew/describe
> delegationtoken
> > > >> requests.
> > > >> changes are propagated through zk notifications.  Every broker is
> > > >> responsible for
> > > >> expiring the tokens. This check be can done during request handling
> > time
> > > >> and/or
> > > >> during token authentication time.
> > > >>
> > > >>
> > > >> >
> > > >> >
> > > >> > 110. The diagrams in the wiki still show MD5 digest. Could you
> > change
> > > it
> > > >> to
> > > >> > SCRAM?
> > > >> >
> > > >> >
> > > >>   Updated the diagram.
> > > >>
> > > >>
> > > >>
> > > >> Thanks,
> > > >> Manikumar
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > Thanks.
> > > >> > > Manikumar
> > > >> > >
> > > >> > >
> > > >> > > >
> > > >> > > > On Fri, Dec 23, 2016 at 9:26 AM, Manikumar <
> > > >> manikumar.re...@gmail.com>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Hi,
> > > >> > > > >
> > > >> > > > > I would like to initiate the vote on KIP-48:
> > > >> > > > >
> > > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+
> > > >> > > > > Delegation+token+support+for+Kafka
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Manikumar
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


Re: [DISCUSS] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-03 Thread Roger Hoover
This is great.  Thanks, Ismael.

On Fri, Feb 3, 2017 at 7:35 AM, Grant Henke  wrote:

> Looks good to me. Thanks for handling the KIP.
>
> On Fri, Feb 3, 2017 at 8:49 AM, Damian Guy  wrote:
>
> > Thanks Ismael. Makes sense to me.
> >
> > On Fri, 3 Feb 2017 at 10:39 Ismael Juma  wrote:
> >
> > > Hi all,
> > >
> > > I have posted a KIP for dropping support for Java 7 in Kafka 0.11:
> > >
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 118%3A+Drop+Support+for+Java+7+in+Kafka+0.11
> > >
> > > Most people were supportive when we last discussed the topic[1], but
> > there
> > > were a few concerns. I believe the following should mitigate the
> > concerns:
> > >
> > > 1. The new proposal suggests dropping support in the next major version
> > > instead of the next minor version.
> > > 2. KIP-97 which is part of 0.10.2 means that 0.11 clients will support
> > 0.10
> > > brokers (0.11 brokers will also support 0.10 clients as usual), so
> there
> > is
> > > even more flexibility on incremental upgrades.
> > > 3. Java 9 will be released shortly after the next Kafka release, so
> we'll
> > > be supporting the 2 most recent Java releases, which is a reasonable
> > > policy.
> > > 4. 8 months have passed since the last proposal and the release after
> > > 0.10.2 won't be out for another 4 months, which should hopefully be
> > enough
> > > time for Java 8 to be even more established. We haven't decided when
> the
> > > next major release will happen, but we know that it won't happen before
> > > June 2017.
> > >
> > > Please take a look at the proposal and share your feedback.
> > >
> > > Thanks,
> > > Ismael
> > >
> > > [1] http://search-hadoop.com/m/Kafka/uyzND1oIhV61GS5Sf2
> > >
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


Re: [DISCUSS] KIP-84: Support SASL/SCRAM mechanisms

2017-01-24 Thread Roger Hoover
Thanks, Ismael.  Just curious, why does it not make sense to do bcrypt
it in the context of SCRAM?

On Mon, Jan 23, 2017 at 3:54 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Roger,
>
> SCRAM uses the PBKDF2 mechanism, here's a comparison between PBKDF2 and
> bcrypt:
>
> http://security.stackexchange.com/questions/4781/do-any-secu
> rity-experts-recommend-bcrypt-for-password-storage/6415#6415
>
> It may be worth supporting bcrypt, but not sure it would make sense to do
> it in the context of SCRAM.
>
> A minor correction: the KIP includes SCRAM-SHA-256 and SCRAM-SHA-512 (not
> SCRAM-SHA-1).
>
> Ismael
>
> On Mon, Jan 23, 2017 at 10:49 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Sorry for the late question but is there a reason to choose SHA-1 and
> > SHA-256 instead of bcrypt?
> >
> > https://codahale.com/how-to-safely-store-a-password/
> >
> > On Fri, Nov 11, 2016 at 5:30 AM, Rajini Sivaram <
> > rajinisiva...@googlemail.com> wrote:
> >
> > > I think all the comments and suggestions on this thread have now been
> > > incorporated into the KIP. If there are no objections, I will start the
> > > voting process on Monday.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > > On Tue, Nov 8, 2016 at 9:20 PM, Rajini Sivaram <
> > > rajinisiva...@googlemail.com
> > > > wrote:
> > >
> > > > Jun,
> > > >
> > > > Have added a sub-section on delegation token support to the KIP.
> > > >
> > > > Thank you,
> > > >
> > > > Rajini
> > > >
> > > > On Tue, Nov 8, 2016 at 8:07 PM, Jun Rao <j...@confluent.io> wrote:
> > > >
> > > >> Hi, Rajini,
> > > >>
> > > >> That makes sense. Could you document this potential future extension
> > in
> > > >> the
> > > >> KIP?
> > > >>
> > > >> Jun
> > > >>
> > > >> On Tue, Nov 8, 2016 at 11:17 AM, Rajini Sivaram <
> > > >> rajinisiva...@googlemail.com> wrote:
> > > >>
> > > >> > Jun,
> > > >> >
> > > >> > 11. SCRAM messages have an optional extensions field which is a
> list
> > > of
> > > >> > key=value pairs. We can add an extension key to the first client
> > > >> message to
> > > >> > indicate delegation token. Broker can then obtain credentials and
> > > >> principal
> > > >> > using a different code path for delegation tokens.
> > > >> >
> > > >> > On Tue, Nov 8, 2016 at 6:38 PM, Jun Rao <j...@confluent.io> wrote:
> > > >> >
> > > >> > > Magnus,
> > > >> > >
> > > >> > > Thanks for the input. If you don't feel strongly the need to
> bump
> > up
> > > >> the
> > > >> > > version of SaslHandshake, we can leave the version unchanged.
> > > >> > >
> > > >> > > Rajini,
> > > >> > >
> > > >> > > 11. Yes, we could send the HMAC as the SCRAM password for the
> > > >> delegation
> > > >> > > token. Do we need something to indicate that this SCRAM token is
> > > >> special
> > > >> > > (i.e., delegation token) so that we can generate the correct
> > > >> > > KafkaPrincipal? The delegation token logic can be added later. I
> > am
> > > >> > asking
> > > >> > > just so that we have enough in the design of SCRAM to add the
> > > >> delegation
> > > >> > > token logic later.
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Jun
> > > >> > >
> > > >> > >
> > > >> > > On Tue, Nov 8, 2016 at 4:42 AM, Rajini Sivaram <
> > > >> > > rajinisiva...@googlemail.com
> > > >> > > > wrote:
> > > >> > >
> > > >> > > > Hi Jun,
> > > >> > > >
> > > >> > > > 10. *s=* and *i=* come from the SCRAM
> standard
> > > >> (they
> > > >> > > are
> > > >> > > > transferred during SCRAM auth). Scram messages look like (for
> > > >> ex

Re: [DISCUSS] KIP-84: Support SASL/SCRAM mechanisms

2017-01-23 Thread Roger Hoover
Sorry for the late question but is there a reason to choose SHA-1 and
SHA-256 instead of bcrypt?

https://codahale.com/how-to-safely-store-a-password/

On Fri, Nov 11, 2016 at 5:30 AM, Rajini Sivaram <
rajinisiva...@googlemail.com> wrote:

> I think all the comments and suggestions on this thread have now been
> incorporated into the KIP. If there are no objections, I will start the
> voting process on Monday.
>
> Regards,
>
> Rajini
>
> On Tue, Nov 8, 2016 at 9:20 PM, Rajini Sivaram <
> rajinisiva...@googlemail.com
> > wrote:
>
> > Jun,
> >
> > Have added a sub-section on delegation token support to the KIP.
> >
> > Thank you,
> >
> > Rajini
> >
> > On Tue, Nov 8, 2016 at 8:07 PM, Jun Rao  wrote:
> >
> >> Hi, Rajini,
> >>
> >> That makes sense. Could you document this potential future extension in
> >> the
> >> KIP?
> >>
> >> Jun
> >>
> >> On Tue, Nov 8, 2016 at 11:17 AM, Rajini Sivaram <
> >> rajinisiva...@googlemail.com> wrote:
> >>
> >> > Jun,
> >> >
> >> > 11. SCRAM messages have an optional extensions field which is a list
> of
> >> > key=value pairs. We can add an extension key to the first client
> >> message to
> >> > indicate delegation token. Broker can then obtain credentials and
> >> principal
> >> > using a different code path for delegation tokens.
> >> >
> >> > On Tue, Nov 8, 2016 at 6:38 PM, Jun Rao  wrote:
> >> >
> >> > > Magnus,
> >> > >
> >> > > Thanks for the input. If you don't feel strongly the need to bump up
> >> the
> >> > > version of SaslHandshake, we can leave the version unchanged.
> >> > >
> >> > > Rajini,
> >> > >
> >> > > 11. Yes, we could send the HMAC as the SCRAM password for the
> >> delegation
> >> > > token. Do we need something to indicate that this SCRAM token is
> >> special
> >> > > (i.e., delegation token) so that we can generate the correct
> >> > > KafkaPrincipal? The delegation token logic can be added later. I am
> >> > asking
> >> > > just so that we have enough in the design of SCRAM to add the
> >> delegation
> >> > > token logic later.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jun
> >> > >
> >> > >
> >> > > On Tue, Nov 8, 2016 at 4:42 AM, Rajini Sivaram <
> >> > > rajinisiva...@googlemail.com
> >> > > > wrote:
> >> > >
> >> > > > Hi Jun,
> >> > > >
> >> > > > 10. *s=* and *i=* come from the SCRAM standard
> >> (they
> >> > > are
> >> > > > transferred during SCRAM auth). Scram messages look like (for
> >> example)
> >> > > > *r=,s=,i=*. StoredKey and ServerKey and
> >> not
> >> > > > transferred in SCRAM messages, so I picked two keys that are
> unused
> >> in
> >> > > > SCRAM.
> >> > > >
> >> > > > 11. SCRAM (like DIGEST-MD5 or PLAIN) uses a shared secret/password
> >> for
> >> > > > authentication along with a username and an optional
> >> authorization-id.
> >> > > > Kafka uses the username as the identity (Kafka principal) for
> >> > > > authentication and authorization. KIP-48 doesn't mention
> >> KafkaPrincipal
> >> > > in
> >> > > > the section "Authentication using Token", but a delegation token
> is
> >> > > > associated with a Kafka principal. Since delegation tokens are
> >> acquired
> >> > > on
> >> > > > behalf of a KafkaPrincipal and the principal is included in the
> >> token
> >> > as
> >> > > > the token owner,  clients authenticating with delegation tokens
> >> could
> >> > use
> >> > > > the token owner as username and the token HMAC as shared
> >> > secret/password.
> >> > > >
> >> > > > If necessary, any other form of token identifier may be used as
> >> > username
> >> > > as
> >> > > > well as long as it contains sufficient information for the broker
> to
> >> > > > retrieve/compute the principal and HMAC for authentication. The
> >> server
> >> > > > callback handler can be updated when delegation tokens are
> >> implemented
> >> > to
> >> > > > generate Kafka principal accordingly.
> >> > > >
> >> > > >
> >> > > > On Tue, Nov 8, 2016 at 1:03 AM, Jun Rao  wrote:
> >> > > >
> >> > > > > Hi, Rajini,
> >> > > > >
> >> > > > > A couple of other questions on the KIP.
> >> > > > >
> >> > > > > 10. For the config values stored in ZK, are those keys (s, t, k,
> >> i,
> >> > > etc)
> >> > > > > stored under scram-sha-256 standard?
> >> > > > >
> >> > > > > 11. Could KIP-48 (delegation token) use this KIP to send
> >> delegation
> >> > > > tokens?
> >> > > > > In KIP-48, the client sends a HMAC as the delegation token to
> the
> >> > > server.
> >> > > > > Not sure how this gets mapped to the username/password in this
> >> KIP.
> >> > > > >
> >> > > > > Thanks,
> >> > > > >
> >> > > > > Jun
> >> > > > >
> >> > > > > On Tue, Oct 4, 2016 at 6:43 AM, Rajini Sivaram <
> >> > > > > rajinisiva...@googlemail.com
> >> > > > > > wrote:
> >> > > > >
> >> > > > > > Hi all,
> >> > > > > >
> >> > > > > > I have just created KIP-84 to add SCRAM-SHA-1 and
> SCRAM-SHA-256
> >> > SASL
> >> > > > > > mechanisms to Kafka:
> >> > > > > >
> >> > > > > > 

[jira] [Closed] (KAFKA-4670) Kafka Consumer should validate FetchResponse

2017-01-19 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover closed KAFKA-4670.
---

> Kafka Consumer should validate FetchResponse
> 
>
> Key: KAFKA-4670
> URL: https://issues.apache.org/jira/browse/KAFKA-4670
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.0
>Reporter: Roger Hoover
>Assignee: Jason Gustafson
>Priority: Minor
>
> As a negative test case, I purposefully configured a bad advertised listener 
> endpoint.  
> {code}
> advertised.listeners=PLAINTEXT://www.google.com:80
> {code}
> This causes the Consumer to over-allocate and run out of memory.
> {quote}
> [2017-01-18 10:03:03,866] DEBUG Sending metadata request 
> (type=MetadataRequest, topics=foo) to node -1 
> (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,870] DEBUG Updated cluster metadata version 2 to 
> Cluster(id = oerqPfCuTCKYUUaWdFUSVQ, nodes = [www.google.com:80 (id: 0 rack: 
> null)], partitions = [Partition(topic = foo, partition = 0, leader = 0, 
> replicas = [0], isr = [0])]) (org.apache.kafka.clients.Metadata)
> [2017-01-18 10:03:03,871] DEBUG Received group coordinator response 
> ClientResponse(receivedTimeMs=1484762583870, latencyMs=88, 
> disconnected=false, 
> requestHeader={api_key=10,api_version=0,correlation_id=0,client_id=consumer-1},
>  
> responseBody={error_code=0,coordinator={node_id=0,host=www.google.com,port=80}})
>  (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,871] INFO Discovered coordinator www.google.com:80 (id: 
> 2147483647 rack: null) for group console-consumer-64535. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,871] DEBUG Initiating connection to node 2147483647 at 
> www.google.com:80. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,915] INFO Revoking previously assigned partitions [] for 
> group console-consumer-64535 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-01-18 10:03:03,915] INFO (Re-)joining group console-consumer-64535 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,917] DEBUG Sending JoinGroup ((type: JoinGroupRequest, 
> groupId=console-consumer-64535, sessionTimeout=1, 
> rebalanceTimeout=30, memberId=, protocolType=consumer, 
> groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@564fabc8))
>  to coordinator www.google.com:80 (id: 2147483647 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,932] DEBUG Created socket with SO_RCVBUF = 66646, 
> SO_SNDBUF = 131874, SO_TIMEOUT = 0 to node 2147483647 
> (org.apache.kafka.common.network.Selector)
> [2017-01-18 10:03:03,932] DEBUG Completed connection to node 2147483647.  
> Fetching API versions. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,932] DEBUG Initiating API versions fetch from node 
> 2147483647. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,990] ERROR Unknown error when running consumer:  
> (kafka.tools.ConsoleConsumer$)
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>   at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:169)
>   at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:150)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:355)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:346)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:331)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:300)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(Consume

[jira] [Commented] (KAFKA-4670) Kafka Consumer should validate FetchResponse

2017-01-19 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830670#comment-15830670
 ] 

Roger Hoover commented on KAFKA-4670:
-

Ah, yes, thanks, [~ijuma].  

> Kafka Consumer should validate FetchResponse
> 
>
> Key: KAFKA-4670
> URL: https://issues.apache.org/jira/browse/KAFKA-4670
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.0
>Reporter: Roger Hoover
>Assignee: Jason Gustafson
>Priority: Minor
>
> As a negative test case, I purposefully configured a bad advertised listener 
> endpoint.  
> {code}
> advertised.listeners=PLAINTEXT://www.google.com:80
> {code}
> This causes the Consumer to over-allocate and run out of memory.
> {quote}
> [2017-01-18 10:03:03,866] DEBUG Sending metadata request 
> (type=MetadataRequest, topics=foo) to node -1 
> (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,870] DEBUG Updated cluster metadata version 2 to 
> Cluster(id = oerqPfCuTCKYUUaWdFUSVQ, nodes = [www.google.com:80 (id: 0 rack: 
> null)], partitions = [Partition(topic = foo, partition = 0, leader = 0, 
> replicas = [0], isr = [0])]) (org.apache.kafka.clients.Metadata)
> [2017-01-18 10:03:03,871] DEBUG Received group coordinator response 
> ClientResponse(receivedTimeMs=1484762583870, latencyMs=88, 
> disconnected=false, 
> requestHeader={api_key=10,api_version=0,correlation_id=0,client_id=consumer-1},
>  
> responseBody={error_code=0,coordinator={node_id=0,host=www.google.com,port=80}})
>  (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,871] INFO Discovered coordinator www.google.com:80 (id: 
> 2147483647 rack: null) for group console-consumer-64535. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,871] DEBUG Initiating connection to node 2147483647 at 
> www.google.com:80. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,915] INFO Revoking previously assigned partitions [] for 
> group console-consumer-64535 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-01-18 10:03:03,915] INFO (Re-)joining group console-consumer-64535 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,917] DEBUG Sending JoinGroup ((type: JoinGroupRequest, 
> groupId=console-consumer-64535, sessionTimeout=1, 
> rebalanceTimeout=30, memberId=, protocolType=consumer, 
> groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@564fabc8))
>  to coordinator www.google.com:80 (id: 2147483647 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-01-18 10:03:03,932] DEBUG Created socket with SO_RCVBUF = 66646, 
> SO_SNDBUF = 131874, SO_TIMEOUT = 0 to node 2147483647 
> (org.apache.kafka.common.network.Selector)
> [2017-01-18 10:03:03,932] DEBUG Completed connection to node 2147483647.  
> Fetching API versions. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,932] DEBUG Initiating API versions fetch from node 
> 2147483647. (org.apache.kafka.clients.NetworkClient)
> [2017-01-18 10:03:03,990] ERROR Unknown error when running consumer:  
> (kafka.tools.ConsoleConsumer$)
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>   at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:169)
>   at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:150)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:355)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:346)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:331)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:300)
>   at 
> org.apach

[jira] [Created] (KAFKA-4670) Kafka Consumer should validate FetchResponse

2017-01-18 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4670:
---

 Summary: Kafka Consumer should validate FetchResponse
 Key: KAFKA-4670
 URL: https://issues.apache.org/jira/browse/KAFKA-4670
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.10.2.0
Reporter: Roger Hoover
Assignee: Jason Gustafson
Priority: Minor


As a negative test case, I purposefully configured a bad advertised listener 
endpoint.  

{code}
advertised.listeners=PLAINTEXT://www.google.com:80
{code}

This causes the Consumer to over-allocate and run out of memory.

{quote}
[2017-01-18 10:03:03,866] DEBUG Sending metadata request (type=MetadataRequest, 
topics=foo) to node -1 (org.apache.kafka.clients.NetworkClient)
[2017-01-18 10:03:03,870] DEBUG Updated cluster metadata version 2 to 
Cluster(id = oerqPfCuTCKYUUaWdFUSVQ, nodes = [www.google.com:80 (id: 0 rack: 
null)], partitions = [Partition(topic = foo, partition = 0, leader = 0, 
replicas = [0], isr = [0])]) (org.apache.kafka.clients.Metadata)
[2017-01-18 10:03:03,871] DEBUG Received group coordinator response 
ClientResponse(receivedTimeMs=1484762583870, latencyMs=88, disconnected=false, 
requestHeader={api_key=10,api_version=0,correlation_id=0,client_id=consumer-1}, 
responseBody={error_code=0,coordinator={node_id=0,host=www.google.com,port=80}})
 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-01-18 10:03:03,871] INFO Discovered coordinator www.google.com:80 (id: 
2147483647 rack: null) for group console-consumer-64535. 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-01-18 10:03:03,871] DEBUG Initiating connection to node 2147483647 at 
www.google.com:80. (org.apache.kafka.clients.NetworkClient)
[2017-01-18 10:03:03,915] INFO Revoking previously assigned partitions [] for 
group console-consumer-64535 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-01-18 10:03:03,915] INFO (Re-)joining group console-consumer-64535 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-01-18 10:03:03,917] DEBUG Sending JoinGroup ((type: JoinGroupRequest, 
groupId=console-consumer-64535, sessionTimeout=1, rebalanceTimeout=30, 
memberId=, protocolType=consumer, 
groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@564fabc8))
 to coordinator www.google.com:80 (id: 2147483647 rack: null) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-01-18 10:03:03,932] DEBUG Created socket with SO_RCVBUF = 66646, 
SO_SNDBUF = 131874, SO_TIMEOUT = 0 to node 2147483647 
(org.apache.kafka.common.network.Selector)
[2017-01-18 10:03:03,932] DEBUG Completed connection to node 2147483647.  
Fetching API versions. (org.apache.kafka.clients.NetworkClient)
[2017-01-18 10:03:03,932] DEBUG Initiating API versions fetch from node 
2147483647. (org.apache.kafka.clients.NetworkClient)
[2017-01-18 10:03:03,990] ERROR Unknown error when running consumer:  
(kafka.tools.ConsoleConsumer$)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:169)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:150)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:355)
at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:346)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:331)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:300)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:261)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1025)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:990)
at kafka.consumer.NewShinyConsumer.(BaseConsumer.scala:55)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
{quote}

It seems like the consumer should validate responses

Re: [VOTE] KIP-108: Create Topic Policy

2017-01-09 Thread Roger Hoover
+1 (non-binding)

On Mon, Jan 9, 2017 at 2:15 AM, Edoardo Comar  wrote:

> Ismael,
> thanks for the KIP I see it as quite useful in a managed cloud
> environment.
>
> +1 (non-binding)
> --
> Edoardo Comar
> IBM MessageHub
> eco...@uk.ibm.com
> IBM UK Ltd, Hursley Park, SO21 2JN
>
> IBM United Kingdom Limited Registered in England and Wales with number
> 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6
> 3AU
>
>
>
> From:   Ismael Juma 
> To: dev@kafka.apache.org
> Date:   08/01/2017 14:50
> Subject:[VOTE] KIP-108: Create Topic Policy
> Sent by:isma...@gmail.com
>
>
>
> Hi all,
>
> As the discussion seems to have settled down, I would like to initiate the
> voting process for KIP-108: Create Topic Policy:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-108
> %3A+Create+Topic+Policy
>
> The vote will run for a minimum of 72 hours.
>
> Thanks,
> Ismael
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


Re: [DISCUSS] KIP-108: Create Topic Policy

2017-01-09 Thread Roger Hoover
Got it.  Thanks, Ismael.

On Mon, Jan 9, 2017 at 10:42 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Roger,
>
> That's a good question. The server defaults are passed via the `configure`
> method of the `Configurable` interface that is implemented by
> `CreateTopicPolicy`. I'll mention this explicitly in the KIP.
>
> Ismael
>
> On Mon, Jan 9, 2017 at 6:04 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > This is great.  Thanks, Ismael.
> >
> > One question.  When TopicDetails are passed to the policy implementation,
> > would the server defaults already have been merged?  If not, I think the
> > policy also needs access to the server defaults.
> >
> > Cheers,
> >
> > Roger
> >
> > On Fri, Jan 6, 2017 at 9:26 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Thanks for the review Jun. Yes, that's a good point, I have updated the
> > > KIP.
> > >
> > > Ismael
> > >
> > > On Fri, Jan 6, 2017 at 5:15 PM, Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Hi, Ismael,
> > > >
> > > > Thanks for the KIP. Looks reasonable to me. To be consistent with the
> > > > pattern used in other pluggable interfaces, we probably should make
> the
> > > new
> > > > interface configurable and closable?
> > > >
> > > > Jun
> > > >
> > > > On Fri, Jan 6, 2017 at 4:16 AM, Ismael Juma <ism...@juma.me.uk>
> wrote:
> > > >
> > > > > Thanks Dan and Colin for the feedback. I updated the KIP to include
> > the
> > > > > addition of a validation mode. Since we need to bump the protocol
> > > version
> > > > > for that, I also added an error message per topic to the response.
> I
> > > had
> > > > > the latter as "Future Work", but I actually felt that it should be
> in
> > > the
> > > > > first version (good to have feedback confirming that).
> > > > >
> > > > > Let me know if the changes look good to you.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Thu, Jan 5, 2017 at 9:54 PM, Colin McCabe <cmcc...@apache.org>
> > > wrote:
> > > > >
> > > > > > Yeah, I agree... having a validation mode would be nice.  We
> should
> > > be
> > > > > > explicit that passing validation doesn't 100% guarantee that a
> > > > > > subsequent call to create the topic will succeed, though.  There
> is
> > > an
> > > > > > obvious race condition there-- for example, with a plugin which
> > > > consults
> > > > > > some external authentication system, there could be a change to
> the
> > > > > > privileges in between validation and attempted creation.
> > > > > >
> > > > > > It also seems like we should try to provide a helpful exception
> > > message
> > > > > > for the cases where topic creation fails.  This might involve
> > adding
> > > > > > more detail about error conditions to CreateTopicsRequest...
> right
> > > now
> > > > > > it just returns an error code, but a text message would be a nice
> > > > > > addition.
> > > > > >
> > > > > > cheers,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 5, 2017, at 13:41, dan wrote:
> > > > > > > it would be nice to have a dry-run or validate ability added to
> > > this
> > > > > kip.
> > > > > > > since we are offloading validation to a 3rd party implementor a
> > > > random
> > > > > > > user
> > > > > > > can't know a priori (based solely on kafka configs) whether a
> > call
> > > > > should
> > > > > > > succeed without actually creating the topic.
> > > > > > >
> > > > > > > a similar case is in connect where there is a separate endpoint
> > > > > > > <https://github.com/apache/kafka/blob/trunk/connect/
> > > > > > runtime/src/main/java/org/apache/kafka/connect/runtime/rest/
> > > resources/
> > > > > > ConnectorPluginsResource.java#L49-L58>
> > > > > > > to attempt to validate a connect configuration without actually
> > > > > creating
> > > > > > > the connector.
> > > > > > >
> > > > > > > thanks
> > > > > > > dan
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jan 5, 2017 at 7:34 AM, Ismael Juma <ism...@juma.me.uk
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > We've posted "KIP-108: Create Topic Policy" for discussion:
> > > > > > > >
> > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > 108%3A+Create+Topic+Policy
> > > > > > > >
> > > > > > > > Please take a look. Your feedback is appreciated.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Ismael
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-108: Create Topic Policy

2017-01-09 Thread Roger Hoover
This is great.  Thanks, Ismael.

One question.  When TopicDetails are passed to the policy implementation,
would the server defaults already have been merged?  If not, I think the
policy also needs access to the server defaults.

Cheers,

Roger

On Fri, Jan 6, 2017 at 9:26 AM, Ismael Juma  wrote:

> Thanks for the review Jun. Yes, that's a good point, I have updated the
> KIP.
>
> Ismael
>
> On Fri, Jan 6, 2017 at 5:15 PM, Jun Rao  wrote:
>
> > Hi, Ismael,
> >
> > Thanks for the KIP. Looks reasonable to me. To be consistent with the
> > pattern used in other pluggable interfaces, we probably should make the
> new
> > interface configurable and closable?
> >
> > Jun
> >
> > On Fri, Jan 6, 2017 at 4:16 AM, Ismael Juma  wrote:
> >
> > > Thanks Dan and Colin for the feedback. I updated the KIP to include the
> > > addition of a validation mode. Since we need to bump the protocol
> version
> > > for that, I also added an error message per topic to the response. I
> had
> > > the latter as "Future Work", but I actually felt that it should be in
> the
> > > first version (good to have feedback confirming that).
> > >
> > > Let me know if the changes look good to you.
> > >
> > > Ismael
> > >
> > > On Thu, Jan 5, 2017 at 9:54 PM, Colin McCabe 
> wrote:
> > >
> > > > Yeah, I agree... having a validation mode would be nice.  We should
> be
> > > > explicit that passing validation doesn't 100% guarantee that a
> > > > subsequent call to create the topic will succeed, though.  There is
> an
> > > > obvious race condition there-- for example, with a plugin which
> > consults
> > > > some external authentication system, there could be a change to the
> > > > privileges in between validation and attempted creation.
> > > >
> > > > It also seems like we should try to provide a helpful exception
> message
> > > > for the cases where topic creation fails.  This might involve adding
> > > > more detail about error conditions to CreateTopicsRequest... right
> now
> > > > it just returns an error code, but a text message would be a nice
> > > > addition.
> > > >
> > > > cheers,
> > > > Colin
> > > >
> > > >
> > > > On Thu, Jan 5, 2017, at 13:41, dan wrote:
> > > > > it would be nice to have a dry-run or validate ability added to
> this
> > > kip.
> > > > > since we are offloading validation to a 3rd party implementor a
> > random
> > > > > user
> > > > > can't know a priori (based solely on kafka configs) whether a call
> > > should
> > > > > succeed without actually creating the topic.
> > > > >
> > > > > a similar case is in connect where there is a separate endpoint
> > > > >  > > > runtime/src/main/java/org/apache/kafka/connect/runtime/rest/
> resources/
> > > > ConnectorPluginsResource.java#L49-L58>
> > > > > to attempt to validate a connect configuration without actually
> > > creating
> > > > > the connector.
> > > > >
> > > > > thanks
> > > > > dan
> > > > >
> > > > >
> > > > > On Thu, Jan 5, 2017 at 7:34 AM, Ismael Juma 
> > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We've posted "KIP-108: Create Topic Policy" for discussion:
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 108%3A+Create+Topic+Policy
> > > > > >
> > > > > > Please take a look. Your feedback is appreciated.
> > > > > >
> > > > > > Thanks,
> > > > > > Ismael
> > > > > >
> > > >
> > >
> >
>


Re: KIP-103: Separation of Internal and External traffic

2017-01-06 Thread Roger Hoover
Thank you, Ismael.

Sent from my iPhone

> On Jan 6, 2017, at 4:46 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> 
> Thanks Roger. I asked around and it seems like `listener name` is what
> people found most intuitive in the context of configs. So, I have updated
> the KIP to use that.
> 
> Ismael
> 
>> On Fri, Jan 6, 2017 at 9:42 PM, Roger Hoover <roger.hoo...@gmail.com> wrote:
>> 
>> Ismael,
>> 
>> Listener id would also convey uniqueness but I'm ok with listener key as
>> well since it  fits with the use of the term "map" in other properties.
>> 
>> My initially feeling against the word key was that it seemed more natural
>> in documentation about Kafka allowing multiple  listener (even with the
>> same protocol) and listeners are identified by name or ID.  It seemed a
>> little more awkward to talk about listeners having keys as identifiers.
>> That fact that the listener ID is used as a key in config maps is secondary.
>> 
>> Your suggestion for removing the protocol prefix makes sense.  Listeners
>> must have a protocol but that ZooKeeper field is only meant to hold the
>> listener ID.
>> 
>> Cheers,
>> 
>> Roger
>> 
>> Sent from my iPhone
>> 
>>> On Jan 6, 2017, at 12:24 PM, Ismael Juma <ism...@juma.me.uk> wrote:
>>> 
>>> Hi Roger,
>>> 
>>> I think `listener_key` makes it clear that it has to be unique per
>>> listener, so I prefer it a little over `listener_name`. Since the
>> existing
>>> config is called `listeners` instead of `protocol.listeners`, maybe we
>>> don't need the protocol prefix?
>>> 
>>> Ismael
>>> 
>>>> On Fri, Jan 6, 2017 at 7:48 PM, Roger Hoover <roger.hoo...@gmail.com>
>> wrote:
>>>> 
>>>> Maybe it's clearer to to say protocol_listener_name?  The proposed
>> config
>>>> allows you to name each listener and refer to their names in various
>>>> places.
>>>> 
>>>> 
>>>> 
>>>>> On Wed, Jan 4, 2017 at 4:34 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>>>>> 
>>>>> Hi Colin,
>>>>> 
>>>>> Thanks for the feedback. It's a good question regarding the name
>>>> `protocol
>>>>> label`. It was an easy starting name given that the security protocol
>> was
>>>>> replaced by a label in the listener string. However, I agree that it's
>>>>> perhaps not as clear as it could be. Maybe `listener key` would be a
>>>> better
>>>>> name? It makes it clear that it should be unique in a listeners list
>> and
>>>>> that it's used to associate a listener to something else (like a
>> security
>>>>> protocol). Thoughts?
>>>>> 
>>>>> Ismael
>>>>> 
>>>>> On Wed, Jan 4, 2017 at 12:29 AM, Colin McCabe <cmcc...@apache.org>
>>>> wrote:
>>>>> 
>>>>>> Good idea.  It would be really nice to be able to constrain
>> replication
>>>>>> traffic to a specific interface or use different security settings.
>>>>>> 
>>>>>> I'm having a little trouble understanding the "protocol label"
>> concept.
>>>>>> Clearly protocol labels map to protocols, but they also seem to
>>>> identify
>>>>>> particular types of traffic.  Would it be more appropriate to call
>>>> these
>>>>>> "traffic types" or "endpoint types"?  Or am I misunderstanding the
>>>>>> proposal?
>>>>>> 
>>>>>> cheers,
>>>>>> Colin
>>>>>> 
>>>>>> 
>>>>>>> On Thu, Dec 22, 2016, at 08:00, Ismael Juma wrote:
>>>>>>> I've updated the KIP to:
>>>>>>> 
>>>>>>> 1. Include the ability to set different security configs depending on
>>>>> the
>>>>>>> protocol label.
>>>>>>> 2. Include the mapping from protocol label to security protocol in ZK
>>>>> and
>>>>>>> UpdateMetadataRequest.
>>>>>>> 3. More items in the "Rejected Alternatives" section.
>>>>>>> 4. Take into account old ZooKeeper-based consumers.
>>>>>>> 
>>>>>>> Feedback is appreciated as always.
>>>>>>> 
>>>>>>&g

Re: KIP-103: Separation of Internal and External traffic

2017-01-06 Thread Roger Hoover
Ismael,

Listener id would also convey uniqueness but I'm ok with listener key as well 
since it  fits with the use of the term "map" in other properties.

My initially feeling against the word key was that it seemed more natural in 
documentation about Kafka allowing multiple  listener (even with the same 
protocol) and listeners are identified by name or ID.  It seemed a little more 
awkward to talk about listeners having keys as identifiers.  That fact that the 
listener ID is used as a key in config maps is secondary.

Your suggestion for removing the protocol prefix makes sense.  Listeners must 
have a protocol but that ZooKeeper field is only meant to hold the listener ID.

Cheers,

Roger 

Sent from my iPhone

> On Jan 6, 2017, at 12:24 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> 
> Hi Roger,
> 
> I think `listener_key` makes it clear that it has to be unique per
> listener, so I prefer it a little over `listener_name`. Since the existing
> config is called `listeners` instead of `protocol.listeners`, maybe we
> don't need the protocol prefix?
> 
> Ismael
> 
>> On Fri, Jan 6, 2017 at 7:48 PM, Roger Hoover <roger.hoo...@gmail.com> wrote:
>> 
>> Maybe it's clearer to to say protocol_listener_name?  The proposed config
>> allows you to name each listener and refer to their names in various
>> places.
>> 
>> 
>> 
>>> On Wed, Jan 4, 2017 at 4:34 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>>> 
>>> Hi Colin,
>>> 
>>> Thanks for the feedback. It's a good question regarding the name
>> `protocol
>>> label`. It was an easy starting name given that the security protocol was
>>> replaced by a label in the listener string. However, I agree that it's
>>> perhaps not as clear as it could be. Maybe `listener key` would be a
>> better
>>> name? It makes it clear that it should be unique in a listeners list and
>>> that it's used to associate a listener to something else (like a security
>>> protocol). Thoughts?
>>> 
>>> Ismael
>>> 
>>> On Wed, Jan 4, 2017 at 12:29 AM, Colin McCabe <cmcc...@apache.org>
>> wrote:
>>> 
>>>> Good idea.  It would be really nice to be able to constrain replication
>>>> traffic to a specific interface or use different security settings.
>>>> 
>>>> I'm having a little trouble understanding the "protocol label" concept.
>>>> Clearly protocol labels map to protocols, but they also seem to
>> identify
>>>> particular types of traffic.  Would it be more appropriate to call
>> these
>>>> "traffic types" or "endpoint types"?  Or am I misunderstanding the
>>>> proposal?
>>>> 
>>>> cheers,
>>>> Colin
>>>> 
>>>> 
>>>>> On Thu, Dec 22, 2016, at 08:00, Ismael Juma wrote:
>>>>> I've updated the KIP to:
>>>>> 
>>>>> 1. Include the ability to set different security configs depending on
>>> the
>>>>> protocol label.
>>>>> 2. Include the mapping from protocol label to security protocol in ZK
>>> and
>>>>> UpdateMetadataRequest.
>>>>> 3. More items in the "Rejected Alternatives" section.
>>>>> 4. Take into account old ZooKeeper-based consumers.
>>>>> 
>>>>> Feedback is appreciated as always.
>>>>> 
>>>>> I'm particularly interested in people's opinions on the config format
>>> as
>>>>> I
>>>>> am still unsure when it comes to the proposed format versus the first
>>>>> rejected alternative.
>>>>> 
>>>>> Ismael
>>>>> 
>>>>> On Wed, Dec 21, 2016 at 11:37 PM, Ismael Juma <ism...@juma.me.uk>
>>> wrote:
>>>>> 
>>>>>> Thanks Rajini.
>>>>>> 
>>>>>> I agree that it's worth thinking about what a fully configurable
>>> label
>>>>>> would look like. I'll update the KIP.
>>>>>> 
>>>>>> Ismael
>>>>>> 
>>>>>> On 21 Dec 2016 10:53 pm, "Rajini Sivaram" <rajinisiva...@gmail.com
>>> 
>>>> wrote:
>>>>>> 
>>>>>> Hi Ismael,
>>>>>> 
>>>>>> Thank you for the KIP. This is a very useful change.
>>>>>> 
>>>>>> Once you allow multiple interfaces with the same security protocol,
>>> you
>>>>&

Re: KIP-103: Separation of Internal and External traffic

2017-01-06 Thread Roger Hoover
Maybe it's clearer to to say protocol_listener_name?  The proposed config
allows you to name each listener and refer to their names in various places.



On Wed, Jan 4, 2017 at 4:34 AM, Ismael Juma  wrote:

> Hi Colin,
>
> Thanks for the feedback. It's a good question regarding the name `protocol
> label`. It was an easy starting name given that the security protocol was
> replaced by a label in the listener string. However, I agree that it's
> perhaps not as clear as it could be. Maybe `listener key` would be a better
> name? It makes it clear that it should be unique in a listeners list and
> that it's used to associate a listener to something else (like a security
> protocol). Thoughts?
>
> Ismael
>
> On Wed, Jan 4, 2017 at 12:29 AM, Colin McCabe  wrote:
>
> > Good idea.  It would be really nice to be able to constrain replication
> > traffic to a specific interface or use different security settings.
> >
> > I'm having a little trouble understanding the "protocol label" concept.
> > Clearly protocol labels map to protocols, but they also seem to identify
> > particular types of traffic.  Would it be more appropriate to call these
> > "traffic types" or "endpoint types"?  Or am I misunderstanding the
> > proposal?
> >
> > cheers,
> > Colin
> >
> >
> > On Thu, Dec 22, 2016, at 08:00, Ismael Juma wrote:
> > > I've updated the KIP to:
> > >
> > > 1. Include the ability to set different security configs depending on
> the
> > > protocol label.
> > > 2. Include the mapping from protocol label to security protocol in ZK
> and
> > > UpdateMetadataRequest.
> > > 3. More items in the "Rejected Alternatives" section.
> > > 4. Take into account old ZooKeeper-based consumers.
> > >
> > > Feedback is appreciated as always.
> > >
> > > I'm particularly interested in people's opinions on the config format
> as
> > > I
> > > am still unsure when it comes to the proposed format versus the first
> > > rejected alternative.
> > >
> > > Ismael
> > >
> > > On Wed, Dec 21, 2016 at 11:37 PM, Ismael Juma 
> wrote:
> > >
> > > > Thanks Rajini.
> > > >
> > > > I agree that it's worth thinking about what a fully configurable
> label
> > > > would look like. I'll update the KIP.
> > > >
> > > > Ismael
> > > >
> > > > On 21 Dec 2016 10:53 pm, "Rajini Sivaram" 
> > wrote:
> > > >
> > > > Hi Ismael,
> > > >
> > > > Thank you for the KIP. This is a very useful change.
> > > >
> > > > Once you allow multiple interfaces with the same security protocol,
> you
> > > > will soon also need to be able to configure protocol-specific
> > properties
> > > > for each of the interfaces. To use SSL on internal and external
> > networks,
> > > > you would almost definitely want different keystores with different
> > > > hostname/IP addresses. Similarly for SASL, you might want to enable
> > > > different mechanisms, use a different authentication server etc. This
> > is
> > > > listed under future work.But it may be worth thinking about what a
> > fully
> > > > configurable 'label' looks like. Would every property now become a
> > list/map
> > > > like listeners - you would then end up with maps of lists for some
> > > > properties. It will good if all properties corresponding to a  label
> > > > including listener and advertised.listener are configured
> consistently
> > - if
> > > > that is possible,
> > > >
> > > >
> > > > On Wed, Dec 21, 2016 at 8:56 PM, Ismael Juma 
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > We've posted "KIP-103: Separation of Internal and External traffic"
> > for
> > > > > discussion:
> > > > >
> > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 103%3A+Separation+of+Internal+and+External+traffic
> > > > >  > > > > 103%3A+Separation+of+Internal+and+External+traffic>*
> > > > >
> > > > > Please take a look. Your feedback is appreciated.
> > > > >
> > > > > Thanks,
> > > > > Ismael
> > > > >
> > > >
> > > >
> > > >
> >
>


Re: [VOTE] KIP-103: Separation of Internal and External traffic

2017-01-06 Thread Roger Hoover
+1 (non-binding)

On Fri, Jan 6, 2017 at 11:16 AM, Tom Crayford  wrote:

> +1 (non-binding)
>
> On Fri, Jan 6, 2017 at 6:58 PM, Colin McCabe  wrote:
>
> > Looks good.  +1 (non-binding).
> >
> > What do you think about changing "protocol label" to "listener key"?
> >
> > best,
> > Colin
> >
> >
> > On Fri, Jan 6, 2017, at 09:23, Neha Narkhede wrote:
> > > +1
> > >
> > > On Fri, Jan 6, 2017 at 9:21 AM Jun Rao  wrote:
> > >
> > > > Hi, Ismael,
> > > >
> > > > Thanks for the KIP. +1
> > > >
> > > > Jun
> > > >
> > > > On Fri, Jan 6, 2017 at 2:51 AM, Ismael Juma 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > As the discussion seems to have settled down, I would like to
> > initiate
> > > > the
> > > > > voting process for KIP-103: Separation of Internal and External
> > traffic:
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 103%3A+Separation+of+Internal+and+External+traffic
> > > > >
> > > > > The vote will run for a minimum of 72 hours.
> > > > >
> > > > > Thanks,
> > > > > Ismael
> > > > >
> > > >
> > > --
> > > Thanks,
> > > Neha
> >
>


[jira] [Commented] (KAFKA-4527) Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where paused connector produces messages

2016-12-18 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759269#comment-15759269
 ] 

Roger Hoover commented on KAFKA-4527:
-

Happened again: 

http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-18--001.1482053747--apache--trunk--d6b0b52/

> Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where 
> paused connector produces messages
> ---
>
> Key: KAFKA-4527
> URL: https://issues.apache.org/jira/browse/KAFKA-4527
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, system tests
>Reporter: Ewen Cheslack-Postava
>Assignee: Shikhar Bhushan
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> {quote}
> 
> test_id:
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink
> status: FAIL
> run time:   40.164 seconds
> Paused sink connector should not consume any messages
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 257, in test_pause_and_resume_sink
> assert num_messages == len(self.sink.received_messages()), "Paused sink 
> connector should not consume any messages"
> AssertionError: Paused sink connector should not consume any messages
> {quote}
> See one case here: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  but it has also happened before, e.g. 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-06--001.1481017508--apache--trunk--34aa538/report.html
> Thinking about the test, one simple possibility is that our approach to get 
> the number of messages produced/consumed during the test is flawed -- I think 
> we may not account for additional buffering between the connectors and the 
> process reading their output to determine what they have produced. However, 
> that's just a theory -- the minimal checking on the logs that I did didn't 
> reveal anything obviously wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3808) Transient failure in ReplicaVerificationToolTest

2016-12-18 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759266#comment-15759266
 ] 

Roger Hoover commented on KAFKA-3808:
-

Happened again:  
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-18--001.1482053747--apache--trunk--d6b0b52/



> Transient failure in ReplicaVerificationToolTest
> 
>
> Key: KAFKA-3808
> URL: https://issues.apache.org/jira/browse/KAFKA-3808
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Geoff Anderson
>
> {code}
> test_id:
> 2016-05-29--001.kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 9.231 seconds
> Timed out waiting to reach non-zero number of replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 88, in test_replica_lags
> err_msg="Timed out waiting to reach non-zero number of replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach non-zero number of replica lags.
> {code}
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-05-29--001.1464540508--apache--trunk--404b696/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-18 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759264#comment-15759264
 ] 

Roger Hoover commented on KAFKA-4166:
-

Similar failure: 
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-18--001.1482053747--apache--trunk--d6b0b52/

> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-18 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759260#comment-15759260
 ] 

Roger Hoover commented on KAFKA-4526:
-

Thanks, [~apurva]

> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---
>
> Key: KAFKA-4526
> URL: https://issues.apache.org/jira/browse/KAFKA-4526
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Apurva Mehta
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class:  ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
>   "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are: 
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the 
> first 1000 of these missing messages correctly made it into Kafka's data 
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  for an example.
> Note that there are a number of similar bug reports for different tests: 
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
>  I am wondering if we have a wrong ack setting somewhere that we should be 
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures 
> seem to always start at 0...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-82 - Add Record Headers

2016-12-17 Thread Roger Hoover
Matthias,

Thanks for your input.  I'm +1 on control messages as they seem to be the
simplest way to implement watermarks (
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102), a
feature that would add a lot of value to Kafka Streams IMHO.

Your argument that the control-message indicator belongs in the client-only
section of the record format make sense.  Just to make sure I understand,
are you suggesting that control messages would be indicated by a standard
reserved header?  By standard, I mean that ALL Kafka consumers would know
to handle these messages differently (possibly just ignoring them).  This
would need to be added to the specification of the consumer protocol so
that all Kafka clients implement it, right?  I think it's a good idea but
just checking.

Cheers,

Roger


On Wed, Dec 14, 2016 at 9:51 AM, Matthias J. Sax 
wrote:

> Yes and no. I did overload the term "control message".
>
> EOS control messages are for client-broker communication and thus never
> exposed to any application. And I think this is a good design because
> broker needs to understand those control messages. Thus, this should be
> a protocol change.
>
> The type of control messages I have in mind are for client-client
> (application-application) communication and the broker is agnostic to
> them. Thus, it should not be a protocol change.
>
>
> -Matthias
>
>
>
> On 12/14/16 9:42 AM, radai wrote:
> > arent control messages getting pushed as their own top level protocol
> > change (and a fairly massive one) for the transactions KIP ?
> >
> > On Tue, Dec 13, 2016 at 5:54 PM, Matthias J. Sax 
> > wrote:
> >
> >> Hi,
> >>
> >> I want to add a completely new angle to this discussion. For this, I
> >> want to propose an extension for the headers feature that enables new
> >> uses cases -- and those new use cases might convince people to support
> >> headers (of course including the larger scoped proposal).
> >>
> >> Extended Proposal:
> >>
> >> Allow messages with a certain header key to be special "control
> >> messages" (w/ o w/o payload) that are not exposed to an application via
> >> .poll().
> >>
> >> Thus, a consumer client would automatically skip over those messages. If
> >> an application knows about embedded control messages, it can "sing up"
> >> to those messages by the consumer client and either get a callback or
> >> the consumer auto-drop for this messages gets disabled (allowing to
> >> consumer those messages via poll()).
> >>
> >> (The details need further considerations/discussion. I just want to
> >> sketch the main idea.)
> >>
> >> Usage:
> >>
> >> There is a shared topic (ie, used by multiple applications) and a
> >> producer application wants to embed a special message in the topic for a
> >> dedicated consumer application. Because only one application will
> >> understand this message, it cannot be a regular message as this would
> >> break all applications that do not understand this message. The producer
> >> application would set a special metadata key and no consumer application
> >> would see this control message by default because they did not enable
> >> their consumer client to return this message in poll() (and the client
> >> would just drop this message with special metadata key). Only the single
> >> application that should receive this message, will subscribe to this
> >> message on its consumer client and process it.
> >>
> >>
> >> Concrete Use Case: Kafka Streams
> >>
> >> In Kafka Streams, we would like to propagate "control messages" from
> >> subtopology to subtopology. There are multiple scenarios for which this
> >> would be useful. For example, currently we do not guarantee a
> >> "consistent shutdown" of an application. By this, I mean that input
> >> records might not be completely processed by the whole topology because
> >> the application shutdown happens "in between" and an intermediate result
> >> topic gets "stock" in an intermediate topic. Thus, a user would see an
> >> committed offset of the source topic of the application, but no
> >> corresponding result record in the output topic.
> >>
> >> Having "shutdown markers" would allow us, to first stop the upstream
> >> subtopology and write this marker into the intermediate topic and the
> >> downstream subtopology would only shut down itself after is sees the
> >> "shutdown marker". Thus, we can guarantee on shutdown, that no
> >> "in-flight" messages got stuck in intermediate topics.
> >>
> >>
> >> A similar usage would be for KIP-95 (Incremental Batch Processing).
> >> There was a discussion about the proposed metadata topic, and we could
> >> avoid this metadata topic if we would have "control messages".
> >>
> >>
> >> Right now, we cannot insert an "application control message" because
> >> Kafka Streams does not own all topics it read/writes and thus might
> >> break other consumer application (as described above) if we inject
> >> random messages that are not 

[jira] [Commented] (KAFKA-3808) Transient failure in ReplicaVerificationToolTest

2016-12-17 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758169#comment-15758169
 ] 

Roger Hoover commented on KAFKA-3808:
-

Happened again:

{code}

test_id:
kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
status: FAIL
run time:   1 minute 18.074 seconds


Timed out waiting to reach zero replica lags.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
 line 84, in test_replica_lags
err_msg="Timed out waiting to reach zero replica lags.")
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
raise TimeoutError(err_msg)
TimeoutError: Timed out waiting to reach zero replica lags.
{code}

http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/

> Transient failure in ReplicaVerificationToolTest
> 
>
> Key: KAFKA-3808
> URL: https://issues.apache.org/jira/browse/KAFKA-3808
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Geoff Anderson
>
> {code}
> test_id:
> 2016-05-29--001.kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 9.231 seconds
> Timed out waiting to reach non-zero number of replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 88, in test_replica_lags
> err_msg="Timed out waiting to reach non-zero number of replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach non-zero number of replica lags.
> {code}
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-05-29--001.1464540508--apache--trunk--404b696/report.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4554) ReplicaVerificationToolTest.test_replica_lags system test failure

2016-12-17 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover resolved KAFKA-4554.
-
Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/KAFKA-3808


> ReplicaVerificationToolTest.test_replica_lags system test failure
> -
>
> Key: KAFKA-4554
> URL: https://issues.apache.org/jira/browse/KAFKA-4554
> Project: Kafka
>  Issue Type: Bug
>        Reporter: Roger Hoover
>
> {code}
> 
> test_id:
> kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
> status: FAIL
> run time:   1 minute 18.074 seconds
> Timed out waiting to reach zero replica lags.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
>  line 84, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Timed out waiting to reach zero replica lags.
> {code}
> http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4554) ReplicaVerificationToolTest.test_replica_lags system test failure

2016-12-17 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4554:
---

 Summary: ReplicaVerificationToolTest.test_replica_lags system test 
failure
 Key: KAFKA-4554
 URL: https://issues.apache.org/jira/browse/KAFKA-4554
 Project: Kafka
  Issue Type: Bug
Reporter: Roger Hoover


{code}

test_id:
kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest.test_replica_lags
status: FAIL
run time:   1 minute 18.074 seconds


Timed out waiting to reach zero replica lags.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/tools/replica_verification_test.py",
 line 84, in test_replica_lags
err_msg="Timed out waiting to reach zero replica lags.")
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
raise TimeoutError(err_msg)
TimeoutError: Timed out waiting to reach zero replica lags.
{code}

http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-17 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758158#comment-15758158
 ] 

Roger Hoover commented on KAFKA-4166:
-

Failed again: 
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-17--001.1481967401--apache--trunk--e156f51/

> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-17 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758155#comment-15758155
 ] 

Roger Hoover commented on KAFKA-4526:
-

Failed again

{code}

test_id:
kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=False
status: FAIL
run time:   8 minutes 55.261 seconds


531 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 511 more. Total 
Acked: 172487, Total Consumed: 172119. We validated that the first 531 of these 
missing messages correctly made it into Kafka's data files. This suggests they 
were lost on their way to the consumer.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/throttling_test.py",
 line 175, in test_throttled_reassignment
lambda: self.reassign_partitions(bounce_brokers, self.throttle))
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 101, in run_produce_consume_validate
self.validate()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 163, in validate
assert success, msg
AssertionError: 531 acked message did not make it to the Consumer. They are: 0, 
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 511 
more. Total Acked: 172487, Total Consumed: 172119. We validated that the first 
531 of these missing messages correctly made it into Kafka's data files. This 
suggests they were lost on their way to the consumer.


test_id:
kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=True
status: FAIL
run time:   8 minutes 52.939 seconds


567 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 547 more. Total 
Acked: 169804, Total Consumed: 169248. We validated that the first 567 of these 
missing messages correctly made it into Kafka's data files. This suggests they 
were lost on their way to the consumer.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/throttling_test.py",
 line 175, in test_throttled_reassignment
lambda: self.reassign_partitions(bounce_brokers, self.throttle))
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 101, in run_produce_consume_validate
self.validate()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 163, in validate
assert success, msg
AssertionError: 567 acked message did not make it to the Consumer. They are: 0, 
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 547 
more. Total Acked: 169804, Total Consumed: 169248. We validated that the first 
567 of these missing messages correctly made it into Kafka's data files. This 
suggests they were lost on their way to the consumer.
{code}

http://confluent-systest.s3-website-us-west-2.amazonaws.

[jira] [Created] (KAFKA-4551) StreamsSmokeTest.test_streams intermittent failure

2016-12-16 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4551:
---

 Summary: StreamsSmokeTest.test_streams intermittent failure
 Key: KAFKA-4551
 URL: https://issues.apache.org/jira/browse/KAFKA-4551
 Project: Kafka
  Issue Type: Bug
Reporter: Roger Hoover
Priority: Blocker




{code}
test_id:
kafkatest.tests.streams.streams_smoke_test.StreamsSmokeTest.test_streams
status: FAIL
run time:   4 minutes 44.872 seconds



Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/streams/streams_smoke_test.py",
 line 78, in test_streams
node.account.ssh("grep SUCCESS %s" % self.driver.STDOUT_FILE, 
allow_fail=False)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/cluster/remoteaccount.py",
 line 253, in ssh
raise RemoteCommandError(self, cmd, exit_status, stderr.read())
RemoteCommandError: ubuntu@worker6: Command 'grep SUCCESS 
/mnt/streams/streams.stdout' returned non-zero exit status 1.
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-15--001.1481794587--apache--trunk--7049938/StreamsSmokeTest/test_streams/91.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-16 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754984#comment-15754984
 ] 

Roger Hoover commented on KAFKA-4166:
-

Happened again here:  
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-16--001.1481880892--apache--trunk--e55205b/

> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4527) Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where paused connector produces messages

2016-12-16 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754980#comment-15754980
 ] 

Roger Hoover commented on KAFKA-4527:
-

Happened again:  
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-16--001.1481880892--apache--trunk--e55205b/

> Transient failure of ConnectDistributedTest.test_pause_and_resume_sink where 
> paused connector produces messages
> ---
>
> Key: KAFKA-4527
> URL: https://issues.apache.org/jira/browse/KAFKA-4527
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, system tests
>Reporter: Ewen Cheslack-Postava
>Assignee: Shikhar Bhushan
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> {quote}
> 
> test_id:
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink
> status: FAIL
> run time:   40.164 seconds
> Paused sink connector should not consume any messages
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 257, in test_pause_and_resume_sink
> assert num_messages == len(self.sink.received_messages()), "Paused sink 
> connector should not consume any messages"
> AssertionError: Paused sink connector should not consume any messages
> {quote}
> See one case here: 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  but it has also happened before, e.g. 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-06--001.1481017508--apache--trunk--34aa538/report.html
> Thinking about the test, one simple possibility is that our approach to get 
> the number of messages produced/consumed during the test is flawed -- I think 
> we may not account for additional buffering between the connectors and the 
> process reading their output to determine what they have produced. However, 
> that's just a theory -- the minimal checking on the logs that I did didn't 
> reveal anything obviously wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-16 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754982#comment-15754982
 ] 

Roger Hoover commented on KAFKA-4526:
-

Happened again here:  
http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-system-test-results/?prefix=2016-12-16--001.1481880892--apache--trunk--e55205b/

> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---
>
> Key: KAFKA-4526
> URL: https://issues.apache.org/jira/browse/KAFKA-4526
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Jason Gustafson
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class:  ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
>   "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are: 
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the 
> first 1000 of these missing messages correctly made it into Kafka's data 
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  for an example.
> Note that there are a number of similar bug reports for different tests: 
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
>  I am wondering if we have a wrong ack setting somewhere that we should be 
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures 
> seem to always start at 0...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-13 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747454#comment-15747454
 ] 

Roger Hoover edited comment on KAFKA-4166 at 12/14/16 6:37 AM:
---

It happened twice more:

{code}
Module: kafkatest.tests.core.mirror_maker_test
Class:  TestMirrorMakerService
Method: test_bounce
Arguments:
{
  "clean_shutdown": true,
  "new_consumer": false,
  "offsets_storage": "kafka"
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz

and

{code}
Module: kafkatest.tests.core.mirror_maker_test
Class: TestMirrorMakerService
Method: test_simple_end_to_end
Arguments: {
   "security_protocol": "PLAINTEXT",
"new_consumer": false
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz




was (Author: theduderog):
It happened twice more on these tests:

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class:  TestMirrorMakerService
Method: test_bounce
Arguments:
{
  "clean_shutdown": true,
  "new_consumer": false,
  "offsets_storage": "kafka"
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz

and

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class: TestMirrorMakerService
Method: test_simple_end_to_end
Arguments: {
   ""security_protocol': ""PLAINTEXT"",
""new_consumer"": false
}"
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz



> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-13 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747454#comment-15747454
 ] 

Roger Hoover edited comment on KAFKA-4166 at 12/14/16 6:36 AM:
---

It happened twice more on these tests:

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class:  TestMirrorMakerService
Method: test_bounce
Arguments:
{
  "clean_shutdown": true,
  "new_consumer": false,
  "offsets_storage": "kafka"
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz

and

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class: TestMirrorMakerService
Method: test_simple_end_to_end
Arguments: {
   ""security_protocol': ""PLAINTEXT"",
""new_consumer"": false
}"
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz




was (Author: theduderog):
It happened twice more on these test:

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class:  TestMirrorMakerService
Method: test_bounce
Arguments:
{
  "clean_shutdown": true,
  "new_consumer": false,
  "offsets_storage": "kafka"
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz

and

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class: TestMirrorMakerService
Method: test_simple_end_to_end
Arguments: {
   ""security_protocol': ""PLAINTEXT"",
""new_consumer"": false
}"
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz



> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4166) TestMirrorMakerService.test_bounce transient system test failure

2016-12-13 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747454#comment-15747454
 ] 

Roger Hoover commented on KAFKA-4166:
-

It happened twice more on these test:

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class:  TestMirrorMakerService
Method: test_bounce
Arguments:
{
  "clean_shutdown": true,
  "new_consumer": false,
  "offsets_storage": "kafka"
}
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_bounce/clean_shutdown%3DTrue.offsets_storage%3Dkafka.new_consumer%3DFalse/51.tgz

and

{code}
"Module: kafkatest.tests.core.mirror_maker_test
Class: TestMirrorMakerService
Method: test_simple_end_to_end
Arguments: {
   ""security_protocol': ""PLAINTEXT"",
""new_consumer"": false
}"
{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-13--001.1481621566--apache--trunk--21d7e6f/TestMirrorMakerService/test_simple_end_to_end/security_protocol%3DPLAINTEXT.new_consumer%3DFalse/55.tgz



> TestMirrorMakerService.test_bounce transient system test failure
> 
>
> Key: KAFKA-4166
> URL: https://issues.apache.org/jira/browse/KAFKA-4166
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>  Labels: transient-system-test-failure
>
> We've only seen one failure so far and it's a timeout error so it could be an 
> environment issue. Filing it here so that we can track it in case there are 
> additional failures:
> {code}
> Module: kafkatest.tests.core.mirror_maker_test
> Class:  TestMirrorMakerService
> Method: test_bounce
> Arguments:
> {
>   "clean_shutdown": true,
>   "new_consumer": true,
>   "security_protocol": "SASL_SSL"
> }
> {code}
>  
> {code}
> test_id:
> 2016-09-12--001.kafkatest.tests.core.mirror_maker_test.TestMirrorMakerService.test_bounce.clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True
> status: FAIL
> run time:   3 minutes 30.354 seconds
> 
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/mirror_maker_test.py",
>  line 178, in test_bounce
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.bounce(clean_shutdown=clean_shutdown))
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in run_produce_consume_validate
> raise e
> TimeoutError
> {code}
>  
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-12--001.1473700895--apache--trunk--a7ab9cb/TestMirrorMakerService/test_bounce/clean_shutdown=True.security_protocol=SASL_SSL.new_consumer=True.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4526) Transient failure in ThrottlingTest.test_throttled_reassignment

2016-12-13 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747308#comment-15747308
 ] 

Roger Hoover commented on KAFKA-4526:
-

This happened again on the Dec 13 nightly run.

> Transient failure in ThrottlingTest.test_throttled_reassignment
> ---
>
> Key: KAFKA-4526
> URL: https://issues.apache.org/jira/browse/KAFKA-4526
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Jason Gustafson
>  Labels: system-test-failure, system-tests
> Fix For: 0.10.2.0
>
>
> This test is seeing transient failures sometimes
> {quote}
> Module: kafkatest.tests.core.throttling_test
> Class:  ThrottlingTest
> Method: test_throttled_reassignment
> Arguments:
> {
>   "bounce_brokers": false
> }
> {quote}
> This happens with both bounce_brokers = true and false. Fails with
> {quote}
> AssertionError: 1646 acked message did not make it to the Consumer. They are: 
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 
> 1626 more. Total Acked: 174799, Total Consumed: 173153. We validated that the 
> first 1000 of these missing messages correctly made it into Kafka's data 
> files. This suggests they were lost on their way to the consumer.
> {quote}
> See 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-12--001.1481535295--apache--trunk--62e043a/report.html
>  for an example.
> Note that there are a number of similar bug reports for different tests: 
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22acked%20message%20did%20not%20make%20it%20to%20the%20Consumer%22%20and%20project%20%3D%20Kafka
>  I am wondering if we have a wrong ack setting somewhere that we should be 
> specifying as acks=all but is only defaulting to 0?
> It also seems interesting that the missing messages in these recent failures 
> seem to always start at 0...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-82 - Add Record Headers

2016-11-09 Thread Roger Hoover
Sorry for going a little in the weeds but thanks for the replies regarding
varint.

Agreed that a prefix and {int, int} can be the same.  It doesn't look like
that's what the KIP is saying the "Open" section.   The example shows 211
for New Relic and 210002 for App Dynamics implying that the New Relic
organization will have only a single header id to work with.  Or is 211
a prefix?  The main point of a namespace or prefix is to reduce the
overhead of config mapping or registration depending on how
namespaces/prefixes are managed.

Would love to hear more feedback on the higher-level questions though...

Cheers,

Roger


On Wed, Nov 9, 2016 at 11:38 AM, radai <radai.rosenbl...@gmail.com> wrote:

> I think this discussion is getting a bit into the weeds on technical
> implementation details.
> I'd liek to step back a minute and try and establish where we are in the
> larger picture:
>
> (re-wording nacho's last paragraph)
> 1. are we all in agreement that headers are a worthwhile and useful
> addition to have? this was contested early on
> 2. are we all in agreement on headers as top level entity vs headers
> squirreled-away in V?
>
> if there are still concerns around these #2 points (#jay? #jun?)?
>
> (and now back to our normal programming ...)
>
> varints are nice. having said that, its adding complexity (see
> https://github.com/addthis/stream-lib/blob/master/src/
> main/java/com/clearspring/analytics/util/Varint.java
> as 1st google result) and would require anyone writing other clients (C?
> Python? Go? Bash? ;-) ) to get/implement the same, and for relatively
> little gain (int vs string is order of magnitude, this isnt).
>
> int namespacing vs {int, int} namespacing are basically the same thing -
> youre just namespacing an int64 and giving people while 2^32 ranges at a
> time. the part i like about this is letting people have a large swath of
> numbers with one registration so they dont have to come back for every
> single plugin/header they want to "reserve".
>
>
> On Wed, Nov 9, 2016 at 11:01 AM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Since some of the debate has been about overhead + performance, I'm
> > wondering if we have considered a varint encoding (
> > https://developers.google.com/protocol-buffers/docs/encoding#varints)
> for
> > the header length field (int32 in the proposal) and for header ids?  If
> you
> > don't use headers, the overhead would be a single byte and for each
> header
> > id < 128 would also need only a single byte?
> >
> >
> >
> > On Wed, Nov 9, 2016 at 6:43 AM, radai <radai.rosenbl...@gmail.com>
> wrote:
> >
> > > @magnus - and very dangerous (youre essentially downloading and
> executing
> > > arbitrary code off the internet on your servers ... bad idea without a
> > > sandbox, even with)
> > >
> > > as for it being a purely administrative task - i disagree.
> > >
> > > i wish it would, really, because then my earlier point on the
> complexity
> > of
> > > the remapping process would be invalid, but at linkedin, for example,
> we
> > > (the team im in) run kafka as a service. we dont really know what our
> > users
> > > (developing applications that use kafka) are up to at any given moment.
> > it
> > > is very possible (given the existance of headers and a corresponding
> > plugin
> > > ecosystem) for some application to "equip" their producers and
> consumers
> > > with the required plugin without us knowing. i dont mean to imply thats
> > > bad, i just want to make the point that its not as simple keeping it in
> > > sync across a large-enough organization.
> > >
> > >
> > > On Wed, Nov 9, 2016 at 6:17 AM, Magnus Edenhill <mag...@edenhill.se>
> > > wrote:
> > >
> > > > I think there is a piece missing in the Strings discussion, where
> > > > pro-Stringers
> > > > reason that by providing unique string identifiers for each header
> > > > everything will just
> > > > magically work for all parts of the stream pipeline.
> > > >
> > > > But the strings dont mean anything by themselves, and while we could
> > > > probably envision
> > > > some auto plugin loader that downloads, compiles, links and runs
> > plugins
> > > > on-demand
> > > > as soon as they're seen by a consumer, I dont really see a use-case
> for
> > > > something
> > > > so dynamic (and fragile) in practice.
> > > >
> > > > In the real world an applicatio

Re: [DISCUSS] KIP-82 - Add Record Headers

2016-11-09 Thread Roger Hoover
t; >> >> >>>>>>> collections are being used.  The overall amount of time
> > spent
> > > >> >> >>> parsing
> > > >> >> >>>> a
> > > >> >> >>>>>> set
> > > >> >> >>>>>>> of header key, value pairs probably does not matter
> unless
> > > you
> > > >> are
> > > >> >> >>>>>> getting
> > > >> >> >>>>>>> close to 1M messages per consumer.  In which case
> probably
> > > >> don't
> > > >> >> use
> > > >> >> >>>>>>> headers.  There is also the option to use very short
> > strings;
> > > >> some
> > > >> >> >>>> that
> > > >> >> >>>>>> are
> > > >> >> >>>>>>> even shorter than integers.
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> Partitioning the string key space will be easier than
> > > >> partitioning
> > > >> >> >>> an
> > > >> >> >>>>>>> integer key space. We won't need a global registry.
> Kafka
> > > >> >> >>> internally
> > > >> >> >>>> can
> > > >> >> >>>>>>> reserve some prefix like "_" as its namespace.  Everyone
> > else
> > > >> can
> > > >> >> >>> use
> > > >> >> >>>>>> their
> > > >> >> >>>>>>> company or project name as namespace prefix and life
> should
> > > be
> > > >> >> good.
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> Here's the link to some of the benchmarking info:
> > > >> >> >>>>>>> https://docs.google.com/document/d/1tfT-
> > > >> >> >>>> 6SZdnKOLyWGDH82kS30PnUkmgb7nPL
> > > >> >> >>>>>> dw6p65pAI/edit?usp=sharing
> > > >> >> >>>>>>>
> > > >> >> >>>>>>>
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> --
> > > >> >> >>>>>>> Sean McCauliff
> > > >> >> >>>>>>> Staff Software Engineer
> > > >> >> >>>>>>> Kafka
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> smccaul...@linkedin.com
> > > >> >> >>>>>>> linkedin.com/in/sean-mccauliff-b563192
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> On Mon, Nov 7, 2016 at 11:51 PM, Michael Pearce <
> > > >> >> >>>> michael.pea...@ig.com>
> > > >> >> >>>>>>> wrote:
> > > >> >> >>>>>>>
> > > >> >> >>>>>>>> +1 on this slimmer version of our proposal
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> I def think the Id space we can reduce from the proposed
> > > >> >> >>>> int32(4bytes)
> > > >> >> >>>>>>>> down to int16(2bytes) it saves on space and as headers
> we
> > > >> wouldn't
> > > >> >> >>>>>> expect
> > > >> >> >>>>>>>> the number of headers being used concurrently being that
> > > high.
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> I would wonder if we should make the value byte array
> > length
> > > >> still
> > > >> >> >>>> int32
> > > >> >> >>>>>>>> though as This is the standard Max array length in Java
> > > saying
> > > >> >> that
> > > >> >> >>>> it
> > > >> >> >>>>>> is a
> > > >> >> >&g

Re: [DISCUSS] KIP-82 - Add Record Headers

2016-11-08 Thread Roger Hoover
naged, so whenever an application needs
> > to
> > > add headers, or use an eco-system plugin that does, its Id allocation
> > will
> > > need to be manually configured.
> > > This moves the allocation concern from the global space down to
> > > organization level and avoids the risk for id conflicts.
> > > Example pseudo-config for some app:
> > > sometrackerplugin.tag.sourcev3.id=1000
> > > dbthing.tag.tablename.id=1001
> > > myschemareg.tag.schemaname.id=1002
> > > myschemareg.tag.schemaversion.id=1003
> > >
> > >
> > > Each header-writing or header-reading plugin must provide means
> > (typically
> > > through configuration) to specify the tag for each header it uses.
> > Defaults
> > > should be avoided.
> > > A consumer silently ignores tags it does not have a mapping for (since
> > the
> > > binary_data can't be parsed without knowing what it is).
> > >
> > > Id range 0..999 is reserved for future use by the broker and must not
> be
> > > used by plugins.
> > >
> > >
> > >
> > > Broker
> > > -
> > > The broker does not process the tags (other than the standard protocol
> > > syntax verification), it simply stores and forwards them as opaque
> data.
> > >
> > > Standard message translation (removal of Headers) kicks in for older
> > > clients.
> > >
> > >
> > > Why not string ids?
> > > -
> > > String ids might seem like a good idea, but:
> > >  * does not really solve uniqueness
> > >  * consumes a lot of space (2 byte string length + string, per header)
> to
> > > be meaningful
> > >  * doesn't really say anything how to parse the tag's data, so it is in
> > > effect useless on its own.
> > >
> > >
> > > Regards,
> > > Magnus
> > >
> > >
> > >
> > >
> > > 2016-11-07 18:32 GMT+01:00 Michael Pearce <michael.pea...@ig.com>:
> > >
> > > > Hi Roger,
> > > >
> > > > Thanks for the support.
> > > >
> > > > I think the key thing is to have a common key space to make an
> > ecosystem,
> > > > there does have to be some level of contract for people to play
> nicely.
> > > >
> > > > Having map<String, byte[]> or as per current proposed in kip of
> having
> > a
> > > > numerical key space of  map<int, byte[]> is a level of the contract
> > that
> > > > most people would expect.
> > > >
> > > > I think the example in a previous comment someone else made linking
> to
> > > AWS
> > > > blog and also implemented api where originally they didn’t have a
> > header
> > > > space but not they do, where keys are uniform but the value can be
> > > string,
> > > > int, anything is a good example.
> > > >
> > > > Having a custom MetadataSerializer is something we had played with,
> but
> > > > discounted the idea, as if you wanted everyone to work the same way
> in
> > > the
> > > > ecosystem, having to have this also customizable makes it a bit
> harder.
> > > > Think about making the whole message record custom serializable, this
> > > would
> > > > make it fairly tricky (though it would not be impossible) to have
> made
> > > work
> > > > nicely. Having the value customizable we thought is a reasonable
> > tradeoff
> > > > here of flexibility over contract of interaction between different
> > > parties.
> > > >
> > > > Is there a particular case or benefit of having serialization
> > > customizable
> > > > that you have in mind?
> > > >
> > > > Saying this it is obviously something that could be implemented, if
> > there
> > > > is a need. If we did go this avenue I think a defaulted serializer
> > > > implementation should exist so for the 80:20 rule, people can just
> have
> > > the
> > > > broker and clients get default behavior.
> > > >
> > > > Cheers
> > > > Mike
> > > >
> > > > On 11/6/16, 5:25 PM, "radai" <radai.rosenbl...@gmail.com> wrote:
> > > >
> > > > making header _key_ serialization configurable potentially
> > undermines
> > > > the
> > > >

Re: [DISCUSS] KIP-82 - Add Record Headers

2016-11-07 Thread Roger Hoover
Nacho,

Thanks for the summary.  #5 is not a binary decision, right?

5a) headers could be "fully" native as proposed - meaning both clients and
brokers would be able to list all keys.
5b) headers could be inside the existing value field.  in this case, only
clients would understand the container format and brokers would remain
unchanged.
5c) headers could be inside a new "metadata" field which would be opaque
bytes as far as the core broker protocol and on-disk format (not part of
the existing value field) but understood by clients.

I guess I'm asking what the reasons are to favor 5a over 5c.  For the case
of broker plugins, those plugins could also understand the common header
format.

Cheers,

Roger


On Mon, Nov 7, 2016 at 3:25 PM, Nacho Solis <nso...@linkedin.com.invalid>
wrote:

> ​Hey Roger.
>
> The original design involved:
> 1- a header set per message (an array of key+values)
> 2- a message level API to set/get headers.
> 3- byte[] header-values
> 4- int header-keys
> 5- headers encoded at the protocol/core level
>
>
> 1- I think most (not all) people would agree that having metadata per
> message is a good thing. Headers is one way to provide this.
>
> 2- There are many use cases for the headers. Quite number of them are at
> the message level. Given this we expect the best way to do this is by
> giving an API at the message level.  Agreement is not at 100% here on
> providing an API to get/set headers available to all.  Some believe this
> should be done purely by interceptors instead of direct API calls.  How
> this "map" is presented to the user via the API can still being fine tuned.
>
> 3- byte[] header values allow the encoding of anything.  This is a black
> box that does not need to be understood by anybody other than the
> plugin/code that wrote the header to start with.  A plugin, if it so
> wishes, could have a custom serializer.  So in here, if somebody wanted to
> use protobuf or avro or what have you you could do that.
>
> 4- int header keys are in the proposal. This offers a very compact
> representation with an easy ability to segment the space. Coordination is
> needed in one way or another, whether ints are used or strings are used.
> In our testing ints are faster than strings... is this performance boost
> worth it?  We have differing opinions.  A lot of people would argue that
> the flexibility of strings plus their ability to have long lengths make
> coordination easier, and that compression will take care of the overhead.
> I will make a quick note that HTTP2, which in theory uses strings as
> headers uses static header compression, effectively using ints for the core
> headers and a precomputed Huffman table for other strings. (
> https://tools.ietf.org/html/rfc7541).
>
> 5- This is the big sticking point.  Should headers be done at the protocol
> level (native) or as a container/wrapper inside the V part of the message.
>
> Benefits of doing container:
> - no modification to the broker
> - no modification to the open source client.
>
> Benefits of doing native:
> - core can use headers (compaction, exactly-once, etc)
> - broker can have plugins
> - open source client can have plugins
> - no need to worry about aliasing (interoperability between headers and no
> header supporting clients)
>
>
> There are a few other benefits that seem to come bundled into the native
> implementation but could be made available in the container format.
>
> For example, we could develop a shared open source client that offers a
> container format. This would allow us to:
> - have other open source projects depend on headers
> - create a community to share plugins
>
> This container format client could be completely separate from Apache Kafka
> or it could be part of Apache Kafka. The people that would like to use
> headers can use that client, and the people that think it's an overhead can
> use the one without.
>
>
> Nacho
>
>
> On Mon, Nov 7, 2016 at 2:54 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Radai,
> >
> > If the broker must parse headers, then I agree that the serialization
> > probably should not be configurable.  However, the if the broker sees
> > metadata only as bytes and clients are the only components that serialize
> > and deserialize the headers, then pluggability seems reasonable.
> >
> > Cheers,
> >
> > Roger
> >
> > On Sun, Nov 6, 2016 at 9:25 AM, radai <radai.rosenbl...@gmail.com>
> wrote:
> >
> > > making header _key_ serialization configurable potentially undermines
> the
> > > board usefulness of the feature (any point along the path must be able
> to
> > > read the header key

Re: [DISCUSS] KIP-82 - Add Record Headers

2016-11-07 Thread Roger Hoover
Radai,

If the broker must parse headers, then I agree that the serialization
probably should not be configurable.  However, the if the broker sees
metadata only as bytes and clients are the only components that serialize
and deserialize the headers, then pluggability seems reasonable.

Cheers,

Roger

On Sun, Nov 6, 2016 at 9:25 AM, radai <radai.rosenbl...@gmail.com> wrote:

> making header _key_ serialization configurable potentially undermines the
> board usefulness of the feature (any point along the path must be able to
> read the header keys. the values may be whatever and require more intimate
> knowledge of the code that produced specific headers, but keys should be
> universally readable).
>
> it would also make it hard to write really portable plugins - say i wrote a
> large message splitter/combiner - if i rely on key "largeMessage" and
> values of the form "1/20" someone who uses (contrived example) Map<Byte[],
> Double> wouldnt be able to re-use my code.
>
> not the end of a the world within an organization, but problematic if you
> want to enable an ecosystem
>
> On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> >  As others have laid out, I see strong reasons for a common message
> > metadata structure for the Kafka ecosystem.  In particular, I've seen
> that
> > even within a single organization, infrastructure teams often own the
> > message metadata while application teams own the application-level data
> > format.  Allowing metadata and content to have different structure and
> > evolve separately is very helpful for this.  Also, I think there's a lot
> of
> > value to having a common metadata structure shared across the Kafka
> > ecosystem so that tools which leverage metadata can more easily be shared
> > across organizations and integrated together.
> >
> > The question is, where does the metadata structure belong?  Here's my
> take:
> >
> > We change the Kafka wire and on-disk format to from a (key, value) model
> to
> > a (key, metadata, value) model where all three are byte arrays from the
> > brokers point of view.  The primary reason for this is that it provides a
> > backward compatible migration path forward.  Producers can start
> populating
> > metadata fields before all consumers understand the metadata structure.
> > For people who already have custom envelope structures, they can populate
> > their existing structure and the new structure for a while as they make
> the
> > transition.
> >
> > We could stop there and let the clients plug in a KeySerializer,
> > MetadataSerializer, and ValueSerializer but I think it is also be useful
> to
> > have a default MetadataSerializer that implements a key-value model
> similar
> > to AMQP or HTTP headers.  Or we could go even further and prescribe a
> > Map<String, byte[]> or Map<String, String> data model for headers in the
> > clients (while still allowing custom serialization of the header data
> > model).
> >
> > I think this would address Radai's concerns:
> > 1. All client code would not need to be updated to know about the
> > container.
> > 2. Middleware friendly clients would have a standard header data model to
> > work with.
> > 3. KIP is required both b/c of broker changes and because of client API
> > changes.
> >
> > Cheers,
> >
> > Roger
> >
> >
> > On Wed, Nov 2, 2016 at 4:38 PM, radai <radai.rosenbl...@gmail.com>
> wrote:
> >
> > > my biggest issues with a "standard" wrapper format:
> > >
> > > 1. _ALL_ client _CODE_ (as opposed to kafka lib version) must be
> updated
> > to
> > > know about the container, because any old naive code trying to directly
> > > deserialize its own payload would keel over and die (it needs to know
> to
> > > deserialize a container, and then dig in there for its payload).
> > > 2. in order to write middleware-friendly clients that utilize such a
> > > container one would basically have to write their own producer/consumer
> > API
> > > on top of the open source kafka one.
> > > 3. if you were going to go with a wrapper format you really dont need
> to
> > > bother with a kip (just open source your own client stack from #2 above
> > so
> > > others could stop re-inventing it)
> > >
> > > On Wed, Nov 2, 2016 at 4:25 PM, James Cheng <wushuja...@gmail.com>
> > wrote:
> > >
> > > > How exactly would this work? Or maybe that's out of scope for this
> > email.
> > >
> >
>


Re: [DISCUSS] KIP-82 - Add Record Headers

2016-11-07 Thread Roger Hoover
ble
> > to
> > read the header keys. the values may be whatever and require more
> > intimate
> > knowledge of the code that produced specific headers, but keys should
> > be
> > universally readable).
> >
> > it would also make it hard to write really portable plugins - say i
> > wrote a
> > large message splitter/combiner - if i rely on key "largeMessage" and
> > values of the form "1/20" someone who uses (contrived example)
> > Map<Byte[],
> > Double> wouldnt be able to re-use my code.
> >
> > not the end of a the world within an organization, but problematic if
> > you
> > want to enable an ecosystem
> >
> > On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <roger.hoo...@gmail.com
> >
> > wrote:
> >
> > >  As others have laid out, I see strong reasons for a common message
> > > metadata structure for the Kafka ecosystem.  In particular, I've
> > seen that
> > > even within a single organization, infrastructure teams often own
> the
> > > message metadata while application teams own the application-level
> > data
> > > format.  Allowing metadata and content to have different structure
> > and
> > > evolve separately is very helpful for this.  Also, I think there's
> a
> > lot of
> > > value to having a common metadata structure shared across the Kafka
> > > ecosystem so that tools which leverage metadata can more easily be
> > shared
> > > across organizations and integrated together.
> > >
> > > The question is, where does the metadata structure belong?  Here's
> > my take:
> > >
> > > We change the Kafka wire and on-disk format to from a (key, value)
> > model to
> > > a (key, metadata, value) model where all three are byte arrays from
> > the
> > > brokers point of view.  The primary reason for this is that it
> > provides a
> > > backward compatible migration path forward.  Producers can start
> > populating
> > > metadata fields before all consumers understand the metadata
> > structure.
> > > For people who already have custom envelope structures, they can
> > populate
> > > their existing structure and the new structure for a while as they
> > make the
> > > transition.
> > >
> > > We could stop there and let the clients plug in a KeySerializer,
> > > MetadataSerializer, and ValueSerializer but I think it is also be
> > useful to
> > > have a default MetadataSerializer that implements a key-value model
> > similar
> > > to AMQP or HTTP headers.  Or we could go even further and
> prescribe a
> > > Map<String, byte[]> or Map<String, String> data model for headers
> in
> > the
> > > clients (while still allowing custom serialization of the header
> data
> > > model).
> > >
> > > I think this would address Radai's concerns:
> > > 1. All client code would not need to be updated to know about the
> > > container.
> > > 2. Middleware friendly clients would have a standard header data
> > model to
> > > work with.
> > > 3. KIP is required both b/c of broker changes and because of client
> > API
> > > changes.
> > >
> > > Cheers,
> > >
> > > Roger
> > >
> > >
> > > On Wed, Nov 2, 2016 at 4:38 PM, radai <radai.rosenbl...@gmail.com>
> > wrote:
> > >
> > > > my biggest issues with a "standard" wrapper format:
> > > >
> > > > 1. _ALL_ client _CODE_ (as opposed to kafka lib version) must be
> > updated
> > > to
> > > > know about the container, because any old naive code trying to
> > directly
> > > > deserialize its own payload would keel over and die (it needs to
> > know to
> > > > deserialize a container, and then dig in there for its payload).
> > > > 2. in order to write middleware-friendly clients that utilize
> such
> > a
> > > > container one would basically have to write their own
> > producer/consumer
> > > API
> > > > on top of the open source kafka one.
> > > > 3. if you were going to go with a wrapper format you really dont
> > need to
> > > > bother with a kip (just open source your own client stack from #2
> > above
> > > so
> > > > others could stop re-inventing it)
> > > >
> > > > On Wed, Nov 2, 2016 at 4:25 PM, James Cheng <
> wushuja...@gmail.com>
> > > wrote:
> > > >
> > > > > How exactly would this work? Or maybe that's out of scope for
> > this
> > > email.
> > > >
> > >
> >
> >
> > The information contained in this email is strictly confidential and for
> > the use of the addressee only, unless otherwise indicated. If you are not
> > the intended recipient, please do not read, copy, use or disclose to
> others
> > this message or any attachment. Please also notify the sender by replying
> > to this email or by telephone (+44(020 7896 0011) and then delete the
> email
> > and any copies of it. Opinions, conclusion (etc) that do not relate to
> the
> > official business of this company shall be understood as neither given
> nor
> > endorsed by it. IG is a trading name of IG Markets Limited (a company
> > registered in England and Wales, company number 04008957) and IG Index
> > Limited (a company registered in England and Wales, company number
> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> > Index Limited (register number 114059) are authorised and regulated by
> the
> > Financial Conduct Authority.
> >
>


Re: [DISCUSS] KIP-82 - Add Record Headers

2016-11-07 Thread Roger Hoover
Please see comments inline.

On Mon, Nov 7, 2016 at 9:32 AM, Michael Pearce <michael.pea...@ig.com>
wrote:

> Hi Roger,
>
> Thanks for the support.
>

Thanks for leading the discussion on this.  It's an important topic.


>
> I think the key thing is to have a common key space to make an ecosystem,
> there does have to be some level of contract for people to play nicely.
>


Agreed.  There doesn't yet seem to be agreement on whether the broker needs
to understand the metadata structure or whether it's a client-level
concept.  We could define a common spec on top of the existing Kafka
protocol and require clients to implement it if they want to be
metadata-compliant.  That would have the advantage of keeping the broker
and core protocol simpler (would require no changes).  The reason I'm in
favor of making the broker aware of metadata is that it would allow a
smooth migration as clients begin using the new metadata structure.
Serializing metadata to byte[] in the protocol makes sense to me because I
don't see any reason that the broker needs to spend CPU time parsing and
validating individual headers that it doesn't care about.  Nor should the
base wire protocol and on-disk format need to commit to a particular header
structure (when it's only needed at a higher level).

I'm not necessarily opposed to defining a key-value structure in core Kafka
but don't see a strong reason to do it there when it could be done at the
client layer (while still enabling a common metadata model across the
ecosystem).  Without a strong reason, it makes sense to keep things simpler
and more efficient for the brokers (byte arrays for keys, metadata, and
values).


> Having map<String, byte[]> or as per current proposed in kip of having a
> numerical key space of  map<int, byte[]> is a level of the contract that
> most people would expect.
>

Yes, this seems good to me too.  I'm in favor of prescribing something like
this in client APIs.


>
> I think the example in a previous comment someone else made linking to AWS
> blog and also implemented api where originally they didn’t have a header
> space but not they do, where keys are uniform but the value can be string,
> int, anything is a good example.
>
> Having a custom MetadataSerializer is something we had played with, but
> discounted the idea, as if you wanted everyone to work the same way in the
> ecosystem, having to have this also customizable makes it a bit harder.
> Think about making the whole message record custom serializable, this would
> make it fairly tricky (though it would not be impossible) to have made work
> nicely. Having the value customizable we thought is a reasonable tradeoff
> here of flexibility over contract of interaction between different parties.
>
> Is there a particular case or benefit of having serialization customizable
> that you have in mind?
>

I guess this depends on whether we decide to encode individual headers in
the protocol or not.  If so, then custom serialization does not make
sense.  If metadata is a byte array, then it does.  The main reason for
allowing custom metadata serialization is that there already exist a lot of
good serialization solutions (Protobuf, Avro, HPACK, etc.). Kafka clients
could ship with a default header serializer and perhaps 80% of users will
just use it (as you said).  As long as the client API is key-value,
everything should interoperate as expected with custom serialization
without any application changes if you configure your custom metadata serde
in all your clients (brokers wouldn't care).



>
> Saying this it is obviously something that could be implemented, if there
> is a need. If we did go this avenue I think a defaulted serializer
> implementation should exist so for the 80:20 rule, people can just have the
> broker and clients get default behavior.
>
> Cheers
> Mike
>
> On 11/6/16, 5:25 PM, "radai" <radai.rosenbl...@gmail.com> wrote:
>
> making header _key_ serialization configurable potentially undermines
> the
> board usefulness of the feature (any point along the path must be able
> to
> read the header keys. the values may be whatever and require more
> intimate
> knowledge of the code that produced specific headers, but keys should
> be
> universally readable).
>
> it would also make it hard to write really portable plugins - say i
> wrote a
> large message splitter/combiner - if i rely on key "largeMessage" and
> values of the form "1/20" someone who uses (contrived example)
> Map<Byte[],
> Double> wouldnt be able to re-use my code.
>
> not the end of a the world within an organization, but problematic if
> you
> want to enable an ecosystem
>
> On Thu, Nov 3, 2016 at 2:04 PM, Roger Hoover <roge

Re: [DISCUSS] KIP-82 - Add Record Headers

2016-11-03 Thread Roger Hoover
 As others have laid out, I see strong reasons for a common message
metadata structure for the Kafka ecosystem.  In particular, I've seen that
even within a single organization, infrastructure teams often own the
message metadata while application teams own the application-level data
format.  Allowing metadata and content to have different structure and
evolve separately is very helpful for this.  Also, I think there's a lot of
value to having a common metadata structure shared across the Kafka
ecosystem so that tools which leverage metadata can more easily be shared
across organizations and integrated together.

The question is, where does the metadata structure belong?  Here's my take:

We change the Kafka wire and on-disk format to from a (key, value) model to
a (key, metadata, value) model where all three are byte arrays from the
brokers point of view.  The primary reason for this is that it provides a
backward compatible migration path forward.  Producers can start populating
metadata fields before all consumers understand the metadata structure.
For people who already have custom envelope structures, they can populate
their existing structure and the new structure for a while as they make the
transition.

We could stop there and let the clients plug in a KeySerializer,
MetadataSerializer, and ValueSerializer but I think it is also be useful to
have a default MetadataSerializer that implements a key-value model similar
to AMQP or HTTP headers.  Or we could go even further and prescribe a
Map or Map data model for headers in the
clients (while still allowing custom serialization of the header data
model).

I think this would address Radai's concerns:
1. All client code would not need to be updated to know about the container.
2. Middleware friendly clients would have a standard header data model to
work with.
3. KIP is required both b/c of broker changes and because of client API
changes.

Cheers,

Roger


On Wed, Nov 2, 2016 at 4:38 PM, radai  wrote:

> my biggest issues with a "standard" wrapper format:
>
> 1. _ALL_ client _CODE_ (as opposed to kafka lib version) must be updated to
> know about the container, because any old naive code trying to directly
> deserialize its own payload would keel over and die (it needs to know to
> deserialize a container, and then dig in there for its payload).
> 2. in order to write middleware-friendly clients that utilize such a
> container one would basically have to write their own producer/consumer API
> on top of the open source kafka one.
> 3. if you were going to go with a wrapper format you really dont need to
> bother with a kip (just open source your own client stack from #2 above so
> others could stop re-inventing it)
>
> On Wed, Nov 2, 2016 at 4:25 PM, James Cheng  wrote:
>
> > How exactly would this work? Or maybe that's out of scope for this email.
>


[jira] [Updated] (KAFKA-4361) Streams does not respect user configs for "default" params

2016-10-31 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-4361:

Description: 
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset, those 
parameters are not used and instead overridden by the defaults.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers (and possibly Producers)

  was:
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset, those 
parameters are not used and instead overridden by the defaults.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers 


> Streams does not respect user configs for "default" params
> --
>
> Key: KAFKA-4361
> URL: https://issues.apache.org/jira/browse/KAFKA-4361
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Roger Hoover
>Assignee: Damian Guy
>
> For the config params in CONSUMER_DEFAULT_OVERRIDES 
> (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
>  such as consumer.max.poll.records and consumer.auto.offset.reset, those 
> parameters are not used and instead overridden by the defaults.  It may not 
> work for some producer config values as well.
> If your application sets those params in the StreamsConfig, they are not used 
> in the underlying Kafka Consumers (and possibly Producers)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4361) Streams does not respect user configs for "default" params

2016-10-31 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-4361:

Description: 
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset, those 
parameters are not used and instead overridden by the defaults.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers 

  was:
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset, those 
parameters are not used and instead overridden by the defaults.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers.


> Streams does not respect user configs for "default" params
> --
>
> Key: KAFKA-4361
> URL: https://issues.apache.org/jira/browse/KAFKA-4361
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Roger Hoover
>Assignee: Damian Guy
>
> For the config params in CONSUMER_DEFAULT_OVERRIDES 
> (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
>  such as consumer.max.poll.records and consumer.auto.offset.reset, those 
> parameters are not used and instead overridden by the defaults.  It may not 
> work for some producer config values as well.
> If your application sets those params in the StreamsConfig, they are not used 
> in the underlying Kafka Consumers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4361) Streams does not respect user configs for "default" params

2016-10-31 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-4361:

Description: 
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset, those 
parameters are not used and instead overridden by the defaults.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers.

  was:
For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers.


> Streams does not respect user configs for "default" params
> --
>
> Key: KAFKA-4361
> URL: https://issues.apache.org/jira/browse/KAFKA-4361
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Roger Hoover
>Assignee: Damian Guy
>
> For the config params in CONSUMER_DEFAULT_OVERRIDES 
> (https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
>  such as consumer.max.poll.records and consumer.auto.offset.reset, those 
> parameters are not used and instead overridden by the defaults.  It may not 
> work for some producer config values as well.
> If your application sets those params in the StreamsConfig, they are not used 
> in the underlying Kafka Consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4361) Streams does not respect user configs for "default" params

2016-10-31 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4361:
---

 Summary: Streams does not respect user configs for "default" params
 Key: KAFKA-4361
 URL: https://issues.apache.org/jira/browse/KAFKA-4361
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.1.0
Reporter: Roger Hoover
Assignee: Damian Guy


For the config params in CONSUMER_DEFAULT_OVERRIDES 
(https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L274)
 such as consumer.max.poll.records and consumer.auto.offset.reset.  It may not 
work for some producer config values as well.

If your application sets those params in the StreamsConfig, they are not used 
in the underlying Kafka Consumers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Add REST Server to Apache Kafka

2016-10-27 Thread Roger Hoover
-1

I think a good set of HTTP APIs can simplify client implementations and
allow for easier integration with Kafka as well as simpler ops/admin
tools.  Elasticsearch is a great example.  However, I think it's too early
to put this into the core.  The existing Kafka REST proxies (that I know
of) are based on HTTP 1.1 and require base64 encoding of binary data.  I
hope/suspect that we will see HTTP2-based Kafka-proxies but they should be
external to the core project for rapid experimentation.  Maybe someday, we
can standardize on one.  As an example, even if we decide to support HTTP
2, we still need a message framing spec on top of it.  gRPC is an option
but who knows what else will emerge.  Then there's metadata serialization
(Protobuf for gRPC).  It will be hard to get everyone to agree on these and
we probably don't need to all agree on them at this point.

For reference,
- this is the PubSub gRPC spec:  https://github.com/
googleapis/googleapis/blob/master/google/pubsub/v1/pubsub.proto
- this is the message framing spec for gRPC on top of HTTP2 -
http://www.grpc.io/docs/guides/wire.html

Delimited-Message → Compressed-Flag Message-Length Message
Compressed-Flag → 0 / 1 # encoded as 1 byte unsigned integer
Message-Length → {length of Message} # encoded as 4 byte unsigned integer
Message → *{binary octet}


Cheers,

Roger

On Wed, Oct 26, 2016 at 10:21 PM, Jaikiran Pai 
wrote:

> -1.
>
> I would personally like Kafka core to be limited to the core aspects. I
> consider REST server like this more a tooling thing than a core aspect of
> Kafka and would like it to be outside of Kafka.
>
> -Jaikiran
>
>
> On Wednesday 26 October 2016 02:46 AM, Harsha Chintalapani wrote:
>
>> Hi All,
>> We are proposing to have a REST Server as part of  Apache
>> Kafka
>> to provide producer/consumer/admin APIs. We Strongly believe having
>> REST server functionality with Apache Kafka will help a lot of users.
>> Here is the KIP that Mani Kumar wrote
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-80:+Ka
>> fka+Rest+Server.
>> There is a discussion thread in dev list that had differing opinions on
>> whether to include REST server in Apache Kafka or not. You can read more
>> about that in this thread
>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201610.mb
>> ox/%3CCAMVt_AyMqeuDM39ZnSXGKtPDdE46sowmqhsXoP-+JMBCUV74Dw@ma
>> il.gmail.com%3E
>>
>>This is a VOTE thread to check interest in the community for
>> adding REST Server implementation in Apache Kafka.
>>
>> Thanks,
>> Harsha
>>
>>
>


[jira] [Created] (KAFKA-4331) Kafka Streams resetter is slow because it joins the same group for each topic

2016-10-21 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4331:
---

 Summary: Kafka Streams resetter is slow because it joins the same 
group for each topic
 Key: KAFKA-4331
 URL: https://issues.apache.org/jira/browse/KAFKA-4331
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.0.1, 0.10.0.0
Reporter: Roger Hoover
Assignee: Matthias J. Sax


The resetter is joining the same group for each topic which takes ~10secs in my 
testing.  This makes the reset very slow when you have a lot of topics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3993) Console producer drops data

2016-08-30 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450054#comment-15450054
 ] 

Roger Hoover commented on KAFKA-3993:
-

Thanks, [~cotedm].  I tried to set acks=all but apparently that doesn't work. 

> Console producer drops data
> ---
>
> Key: KAFKA-3993
> URL: https://issues.apache.org/jira/browse/KAFKA-3993
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>
> The console producer drops data when if the process exits too quickly.  I 
> suspect that the shutdown hook does not call close() or something goes wrong 
> during that close().
> Here's a simple to illustrate the issue:
> {noformat}
> export BOOTSTRAP_SERVERS=localhost:9092
> export TOPIC=bar
> export MESSAGES=1
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
> --replication-factor 1 --topic "$TOPIC" \
> && echo "acks=all" > /tmp/producer.config \
> && echo "linger.ms=0" >> /tmp/producer.config \
> && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
> "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \
> && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
> --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KAFKA-4063) Add support for infinite endpoints for range queries in Kafka Streams KV stores

2016-08-19 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover closed KAFKA-4063.
---

The JIRA UI was unresponsive so I accidentally submitted the form twice.

> Add support for infinite endpoints for range queries in Kafka Streams KV 
> stores
> ---
>
> Key: KAFKA-4063
> URL: https://issues.apache.org/jira/browse/KAFKA-4063
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>    Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> In some applications, it's useful to iterate over the key-value store either:
> 1. from the beginning up to a certain key
> 2. from a certain key to the end
> We can add two new methods rangeUtil() and rangeFrom() easily to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4063) Add support for infinite endpoints for range queries in Kafka Streams KV stores

2016-08-18 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4063:
---

 Summary: Add support for infinite endpoints for range queries in 
Kafka Streams KV stores
 Key: KAFKA-4063
 URL: https://issues.apache.org/jira/browse/KAFKA-4063
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.1.0
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
 Fix For: 0.10.1.0


In some applications, it's useful to iterate over the key-value store either:
1. from the beginning up to a certain key
2. from a certain key to the end

We can add two new methods rangeUtil() and rangeFrom() easily to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4064) Add support for infinite endpoints for range queries in Kafka Streams KV stores

2016-08-18 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4064:
---

 Summary: Add support for infinite endpoints for range queries in 
Kafka Streams KV stores
 Key: KAFKA-4064
 URL: https://issues.apache.org/jira/browse/KAFKA-4064
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.1.0
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
 Fix For: 0.10.1.0


In some applications, it's useful to iterate over the key-value store either:
1. from the beginning up to a certain key
2. from a certain key to the end

We can add two new methods rangeUtil() and rangeFrom() easily to support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3993) Console producer drops data

2016-08-10 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415811#comment-15415811
 ] 

Roger Hoover commented on KAFKA-3993:
-

Thanks, [~vahid].  Yeah, it looks the same.

> Console producer drops data
> ---
>
> Key: KAFKA-3993
> URL: https://issues.apache.org/jira/browse/KAFKA-3993
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>
> The console producer drops data when if the process exits too quickly.  I 
> suspect that the shutdown hook does not call close() or something goes wrong 
> during that close().
> Here's a simple to illustrate the issue:
> {noformat}
> export BOOTSTRAP_SERVERS=localhost:9092
> export TOPIC=bar
> export MESSAGES=1
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
> --replication-factor 1 --topic "$TOPIC" \
> && echo "acks=all" > /tmp/producer.config \
> && echo "linger.ms=0" >> /tmp/producer.config \
> && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
> "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \
> && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
> --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3993) Console producer drops data

2016-08-10 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-3993:

Description: 
The console producer drops data when if the process exits too quickly.  I 
suspect that the shutdown hook does not call close() or something goes wrong 
during that close().

Here's a simple to illustrate the issue:

{noformat}
export BOOTSTRAP_SERVERS=localhost:9092
export TOPIC=bar
export MESSAGES=1
./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
--replication-factor 1 --topic "$TOPIC" \
&& echo "acks=all" > /tmp/producer.config \
&& echo "linger.ms=0" >> /tmp/producer.config \
&& seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
"$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \
&& ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
--new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
{noformat}

  was:
The console producer drops data when if the process exits too quickly.  I 
suspect that the shutdown hook does not call close() or something goes wrong 
during that close().

Here's a simple to illustrate the issue:

{noformat}
export BOOTSTRAP_SERVERS=localhost:9092
export TOPIC=bar
export MESSAGES=1
./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
--replication-factor 1 --topic "$TOPIC" \
&& echo "acks=all" > /tmp/producer.config \
&& echo "linger.ms=0" >> /tmp/producer.config \
&& seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
"$BOOTSTRAP_SERVERS" --topic "$TOPIC" \
&& ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
--new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
{noformat}


> Console producer drops data
> -------
>
> Key: KAFKA-3993
> URL: https://issues.apache.org/jira/browse/KAFKA-3993
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>
> The console producer drops data when if the process exits too quickly.  I 
> suspect that the shutdown hook does not call close() or something goes wrong 
> during that close().
> Here's a simple to illustrate the issue:
> {noformat}
> export BOOTSTRAP_SERVERS=localhost:9092
> export TOPIC=bar
> export MESSAGES=1
> ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
> --replication-factor 1 --topic "$TOPIC" \
> && echo "acks=all" > /tmp/producer.config \
> && echo "linger.ms=0" >> /tmp/producer.config \
> && seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
> "$BOOTSTRAP_SERVERS" --topic "$TOPIC" --producer-config /tmp/producer.config \
> && ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
> --new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown

2016-08-02 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404439#comment-15404439
 ] 

Roger Hoover commented on KAFKA-3752:
-

[~guozhang] You're assessment seems correct.  It happened again when I 
restarted after a clean shutdown (SIGTERM + wait for exit). 

1.  We have a single KafkaStreams instance with 8 threads.
2. Here's the full log:  
https://gist.github.com/theduderog/f9ab4767cd3b098d404f5513a7e1c27e

> Provide a way for KStreams to recover from unclean shutdown
> ---
>
> Key: KAFKA-3752
> URL: https://issues.apache.org/jira/browse/KAFKA-3752
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>  Labels: architecture
>
> If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM 
> Killer), it may leave behind lock files and fail to recover.
> It would be useful to have an options (say --force) to tell KStreams to 
> proceed even if it finds old LOCK files.
> {noformat}
> [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in 
> thread [StreamThread-1]:  
> (org.apache.kafka.streams.processor.internals.StreamThread:583)
> org.apache.kafka.streams.errors.ProcessorStateException: Error while creating 
> the state manager
>   at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>   at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:2

[jira] [Created] (KAFKA-3993) Console producer drops data

2016-07-26 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-3993:
---

 Summary: Console producer drops data
 Key: KAFKA-3993
 URL: https://issues.apache.org/jira/browse/KAFKA-3993
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.10.0.0
Reporter: Roger Hoover


The console producer drops data when if the process exits too quickly.  I 
suspect that the shutdown hook does not call close() or something goes wrong 
during that close().

Here's a simple to illustrate the issue:

{noformat}
export BOOTSTRAP_SERVERS=localhost:9092
export TOPIC=bar
export MESSAGES=1
./bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 
--replication-factor 1 --topic "$TOPIC" \
&& echo "acks=all" > /tmp/producer.config \
&& echo "linger.ms=0" >> /tmp/producer.config \
&& seq "$MESSAGES" | ./bin/kafka-console-producer.sh --broker-list 
"$BOOTSTRAP_SERVERS" --topic "$TOPIC" \
&& ./bin/kafka-console-consumer.sh --bootstrap-server "$BOOTSTRAP_SERVERS" 
--new-consumer --from-beginning --max-messages "${MESSAGES}" --topic "$TOPIC"
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Client Side Auto Topic Creation

2016-06-29 Thread Roger Hoover
My comments go a bit beyond just topic creation but I'd like to see Kafka
make it easier for application developers to specify their requirements
declaratively in a single place.  Today, for example, if your application
requires strong guarantees against data loss, you must set a mix of
topic-level configs (replication factor, min.in.sync.replicas, retention.ms)
and client configs (acks=all and
possibly max.in.flight.requests.per.connection if you care about
ordering).  This can be complicated by organizational structure where you
have a different team (SREs) responsible for the cluster configs and
perhaps topic creation and application teams responsible for the client
settings.  Let's say that you get all the settings right up front.  How
would you know if they later were changed incorrectly?  How do admins know
which topics are ok to add more partitions are which are not?  How do
downstream applications know how much retention they can rely on for
re-processing in their upstream topics.

I think it's useful to consider the typical roles in an organization.  Say
we have an SRE team responsible for overall cluster health, capacity, etc.
This team likely has elevated privileges and perhaps wants to
review/approve settings for new topics to make sure they're sane.

The application developer may not care about some of the details of topic
creation but does care in as much as they affect the application
correctness and SLAs.  It's more than just number of partitions and
replication factor.  The application may require
1) some of it's topics to be compacted to function correctly and
min.compaction.lag.ms (KIP-58) set correctly
2) retention.ms set correctly on some of it's topics to satisfy it's
failure/re-processing SLAs
3) partitioning of it's input topics to match it's expectations
4) the data format to match expectations

I realize that #3 and #4 are unrelated to topic creation but they're part
of a set of invariants that the application needs enforced and should fail
early if their requirements are not met.  For example, with semantically
partitioned topics, the application may break if new partitions are added.
The issue is that there is no standard mechanism or convention to
communicate application requirements so that admins and application teams
can verify that they continue to be met over time.

Imagine for a second that Kafka allowed arbitrary tags to be associated to
topics.  An application could now define a specification for it's
interaction with Kafka including topic names, min replication factors,
fault tolerance settings (replication factors, min.in.sync.replicas,
producer acks), compacted yes/no, topic retention settings, can add/remove
partitions, partition key, and data format.  Some of these requirements map
onto topics configs and some (like acks=all) are producer settings and some
(like partition key and data format) could be organizational conventions
stored as tags (format:avro).

For organizations where only SREs/admins can create/modify topics, this
spec allows them to do their job while being sure they're not breaking the
application.  The application can verify on startup that it's requirements
are satisfied and fail early if not.  If the application has permissions to
create it's own topics then the spec is a declarative format for doing that
require and will not require the same topic creation boilerplate code to be
duplicated in every application.

If people like this approach, perhaps we could define a topic spec (if all
fields besides topic name are empty it use "cluster defaults").  Then the
AdminClient would have an idempotent create method that takes a spec and
verifies that the spec is already met, tries to create topics to meet the
spec, or fails saying it cannot be met.  Perhaps the producer and consumer
APIs would only have a verify() method which checks if the spec is
satisfied.

Cheers,

Roger

On Wed, Jun 29, 2016 at 8:50 AM, Grant Henke  wrote:

> Thanks for the discussion, below are some thoughts and responses.
>
> One of the problems that we currently have with
> > the clients is that we retry silently on unknown topics under the
> > expectation that they will eventually be created (automatically or not).
> > This makes it difficult to detect misconfiguration without looking for
> > warnings in the logs. This problem is compounded if the client isn't
> > authorized to the topic since then we don't actually know if the topic
> > exists or not and whether it is reasonable to keep retrying.
>
>
> Yeah this is a problem thats difficult and opaque to the user. I think any
> of the proposed solutions would help solve this issue. Since the create
> would be done at the metadata request phase, instead of in the produce
> response handling. And if the create fails, the user would receive a munch
> more clear authorization error.
>
> The current auto creation of topic by the broker appear to be the only
> > reason an unknown topic error is retriable
> > 

[DISCUSS] KAFKA-3761: Controller has RunningAsBroker instead of RunningAsController state

2016-06-23 Thread Roger Hoover
Hi all,

Does anyone have an issue with removing the broker state called
"RunningAsController"?

The reasons to remove it are:
1. It's currently broken.  The purpose of the JIRA
 was to report that the
RunningAsController state gets overwritten back to "RunningAsBroker".
2. It's not a useful state.
  a. If clients want to use this metric to know whether a broker is ready
to receive requests or not, they do not care whether or not the broker is
the controller
  b. there is already a separate boolean property, KafkaController.isActive
which contains this information.

Thanks,

Roger


Re: [DISCUSS] KIP-63: Unify store and downstream caching in streams

2016-06-21 Thread Roger Hoover
Thanks, Eno.

On Tue, Jun 21, 2016 at 2:22 AM, Eno Thereska <eno.there...@gmail.com>
wrote:

> Hi Roger,
>
> I realised I never got back to you on this one, sorry. Some answers inline:
>
> > On 3 Jun 2016, at 22:48, Roger Hoover <roger.hoo...@gmail.com> wrote:
> >
> > Hi Eno,
> >
> > Does this mean that Kafka Streams will disable the RocksDB write buffer?
>
> For the purposes of this KIP we might not want to change the current
> RocksDB state of things. However, Guozhang has written about future plans
> in the memory management page <
> https://cwiki.apache.org/confluence/display/KAFKA/Discussion:+Memory+Management+in+Kafka+Streams>
> in the wiki.
>
>
> > Is it currently safe to recover a Kafka Streams application after SIGKILL
> > on the same machine?  If not, will this make it safe to do so?
> >
> > If RocksDB is not flushed before offsets are commited in Kafka and is
> > killed with SIGKILL, will the data in the write buffer be lost (since
> Kafka
> > Streams disables the transaction log)?  That data will be present in the
> > Kafka changelog but will it get applied to the recovered RocksDB database
> > on restart?
>
> These are good questions on failure modes. This KIP will not change the
> failure behaviour from what it currently is. I believe there will be
> subsequent KIPs where the issues around writing atomically to multiple
> places will be considered, and failure semantics will be strengthened. Stay
> tuned.
>
> Thanks
> Eno
>
>
>
> >
> > Thanks,
> >
> > Roger
> >
> > On Fri, Jun 3, 2016 at 2:39 PM, Eno Thereska <eno.there...@gmail.com>
> wrote:
> >
> >> Hi Gwen,
> >>
> >> Yes. As an example, if cache.max.bytes.buffering set to X, and if users
> >> have A aggregation operators and T KTable.to() operators, then X*(A + T)
> >> total bytes will be allocated for caching.
> >>
> >> Eno
> >>
> >>> On 3 Jun 2016, at 21:37, Gwen Shapira <g...@confluent.io> wrote:
> >>>
> >>> Just to clarify: "cache.max.bytes.buffering" is per processor?
> >>>
> >>>
> >>> On Thu, Jun 2, 2016 at 11:30 AM, Eno Thereska <eno.there...@gmail.com>
> >> wrote:
> >>>> Hi there,
> >>>>
> >>>> I have created KIP-63: Unify store and downstream caching in streams
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in+streams
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams
> >>>
> >>>>
> >>>>
> >>>> Feedback is appreciated.
> >>>>
> >>>> Thank you
> >>>> Eno
> >>
> >>
>
>


[jira] [Commented] (KAFKA-3740) Add configs for RocksDBStores

2016-06-16 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334812#comment-15334812
 ] 

Roger Hoover commented on KAFKA-3740:
-

[~h...@pinterest.com], any update on this?  I'm wondering if you have any idea 
when it might be available.

> Add configs for RocksDBStores
> -
>
> Key: KAFKA-3740
> URL: https://issues.apache.org/jira/browse/KAFKA-3740
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Henry Cai
>  Labels: api, newbie
>
> Today most of the rocksDB configs are hard written inside {{RocksDBStore}}, 
> or the default values are directly used. We need to make them configurable 
> for advanced users. For example, some default values may not work perfectly 
> for some scenarios: 
> https://github.com/HenryCaiHaiying/kafka/commit/ccc4e25b110cd33eea47b40a2f6bf17ba0924576
>  
> One way of doing that is to introduce a "RocksDBStoreConfigs" objects similar 
> to "StreamsConfig", which defines all related rocksDB options configs, that 
> can be passed as key-value pairs to "StreamsConfig".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3858) Add functions to print stream topologies

2016-06-16 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-3858:
---

 Summary: Add functions to print stream topologies
 Key: KAFKA-3858
 URL: https://issues.apache.org/jira/browse/KAFKA-3858
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Affects Versions: 0.10.0.0
Reporter: Roger Hoover
Assignee: Guozhang Wang


For debugging and development, it would be very useful to be able to print 
Kafka streams topologies.  At a minimum, it would be great to be able to see 
the logical topology including with Kafka topics linked by sub-topologies.  I 
think that this information does not depend on partitioning.  For more detail, 
it would be great to be able to print the same logical topology but also 
showing number of tasks (an perhaps task ids?).  Finally, it would be great to 
show the physical topology after the tasks have been mapped to JVMs + threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-63: Unify store and downstream caching in streams

2016-06-03 Thread Roger Hoover
Hi Eno,

Does this mean that Kafka Streams will disable the RocksDB write buffer?

Is it currently safe to recover a Kafka Streams application after SIGKILL
on the same machine?  If not, will this make it safe to do so?

If RocksDB is not flushed before offsets are commited in Kafka and is
killed with SIGKILL, will the data in the write buffer be lost (since Kafka
Streams disables the transaction log)?  That data will be present in the
Kafka changelog but will it get applied to the recovered RocksDB database
on restart?

Thanks,

Roger

On Fri, Jun 3, 2016 at 2:39 PM, Eno Thereska  wrote:

> Hi Gwen,
>
> Yes. As an example, if cache.max.bytes.buffering set to X, and if users
> have A aggregation operators and T KTable.to() operators, then X*(A + T)
> total bytes will be allocated for caching.
>
> Eno
>
> > On 3 Jun 2016, at 21:37, Gwen Shapira  wrote:
> >
> > Just to clarify: "cache.max.bytes.buffering" is per processor?
> >
> >
> > On Thu, Jun 2, 2016 at 11:30 AM, Eno Thereska 
> wrote:
> >> Hi there,
> >>
> >> I have created KIP-63: Unify store and downstream caching in streams
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in+streams
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams
> >
> >>
> >>
> >> Feedback is appreciated.
> >>
> >> Thank you
> >> Eno
>
>


[jira] [Work started] (KAFKA-3761) Controller has RunningAsBroker instead of RunningAsController state

2016-05-26 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3761 started by Roger Hoover.
---
> Controller has RunningAsBroker instead of RunningAsController state
> ---
>
> Key: KAFKA-3761
> URL: https://issues.apache.org/jira/browse/KAFKA-3761
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>    Assignee: Roger Hoover
>
> In `KafkaServer.start`, we start `KafkaController`:
> {code}
> /* start kafka controller */
> kafkaController = new KafkaController(config, zkUtils, brokerState, 
> kafkaMetricsTime, metrics, threadNamePrefix)
> kafkaController.startup()
> {code}
> Which sets the state to `RunningAsController` in 
> `KafkaController.onControllerFailover`:
> `brokerState.newState(RunningAsController)`
> And this later gets set to `RunningAsBroker`.
> This doesn't match the diagram in `BrokerStates`. [~junrao] suggested that we 
> should start the controller after we register the broker in ZK, but this 
> seems tricky as we need to controller in `KafkaApis`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KAFKA-3760) Set broker state as running after publishing to ZooKeeper

2016-05-26 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated KAFKA-3760:

Comment: was deleted

(was: Create PR:  https://github.com/apache/kafka/pull/1436)

> Set broker state as running after publishing to ZooKeeper
> -
>
> Key: KAFKA-3760
> URL: https://issues.apache.org/jira/browse/KAFKA-3760
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>Assignee: Jun Rao
>Priority: Minor
>
> Currently, the broker state is set to running before it registers itself in 
> ZooKeeper. This is too early in the broker lifecycle. If clients use the 
> broker state as an indicator that the broker is ready to accept requests, 
> they will get errors. This change is to delay setting the broker state to 
> running until it's registered in ZK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3760) Set broker state as running after publishing to ZooKeeper

2016-05-26 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303141#comment-15303141
 ] 

Roger Hoover commented on KAFKA-3760:
-

Create PR:  https://github.com/apache/kafka/pull/1436

> Set broker state as running after publishing to ZooKeeper
> -
>
> Key: KAFKA-3760
> URL: https://issues.apache.org/jira/browse/KAFKA-3760
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Roger Hoover
>Assignee: Jun Rao
>Priority: Minor
>
> Currently, the broker state is set to running before it registers itself in 
> ZooKeeper. This is too early in the broker lifecycle. If clients use the 
> broker state as an indicator that the broker is ready to accept requests, 
> they will get errors. This change is to delay setting the broker state to 
> running until it's registered in ZK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >