Re: mirror maker against 0.8.2 source cluster and 0.9.0 destination cluster

2016-01-06 Thread Ismael Juma
Hi Stephen,

Newer brokers support older clients, but not the other way around. You
could try 0.8.2 MirrorMaker against 0.8.2 source and 0.9.0 target clusters
perhaps?

Ismael
On 6 Jan 2016 11:18, "Stephen Powis"  wrote:

> Hey!
>
> So I'm trying to get mirror maker going between two different clusters.  My
> source cluster is version 0.8.2 and my destination cluster is 0.9.0 running
> the mirror maker code from the 0.9.0 release.  Does anyone know if this is
> possible to do?  I'm aware that the protocol changed slightly between
> versions.
>
> Attempting to run ./kafka-console-consumer.sh from the 0.9.0 release and
> consume from my 0.8.2 cluster seems to fail, which is leading me to believe
> that mirror maker will have the same issue.
>
> Attached is the errors I receive from kafka-console-consumer.sh running
> from 0.9.0 release against a 0.8.2 cluster.
>
> ./kafka-console-consumer.sh  --zookeeper W.X.Y.Z:2181 --topic MyTopic
>
> [2016-01-06 06:17:40,587] WARN
>
> [ConsumerFetcherThread-console-consumer-73608_hostname-1452079050696-728e3f7e-0-100],
> Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest@6bb31fae.
> Possible cause: java.lang.IllegalArgumentException
> (kafka.consumer.ConsumerFetcherThread)
> ...
>
> Thanks!
>


Kafka Connect Embedded API

2016-01-06 Thread Shiti Saxena
Hi,

I wanted to use Kafka Connect in an application and wanted to if the
Embedded API discussed in KIP-26

available
in 0.9.0.0 or is there an alternative?

Thanks,
Shiti


mirror maker against 0.8.2 source cluster and 0.9.0 destination cluster

2016-01-06 Thread Stephen Powis
Hey!

So I'm trying to get mirror maker going between two different clusters.  My
source cluster is version 0.8.2 and my destination cluster is 0.9.0 running
the mirror maker code from the 0.9.0 release.  Does anyone know if this is
possible to do?  I'm aware that the protocol changed slightly between
versions.

Attempting to run ./kafka-console-consumer.sh from the 0.9.0 release and
consume from my 0.8.2 cluster seems to fail, which is leading me to believe
that mirror maker will have the same issue.

Attached is the errors I receive from kafka-console-consumer.sh running
from 0.9.0 release against a 0.8.2 cluster.

./kafka-console-consumer.sh  --zookeeper W.X.Y.Z:2181 --topic MyTopic

[2016-01-06 06:17:40,587] WARN
[ConsumerFetcherThread-console-consumer-73608_hostname-1452079050696-728e3f7e-0-100],
Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest@6bb31fae.
Possible cause: java.lang.IllegalArgumentException
(kafka.consumer.ConsumerFetcherThread)
...

Thanks!


Partition rebalancing after broker removal

2016-01-06 Thread Tom Crayford
Hi there,

Kafka's `kafka-reassign-partitions.sh` tool currently has no mechanism for
removing brokers. However, it does have the ability to generate partition
plans across arbitrary sets of brokers, by using `--generate`, passing all
the topics in the cluster into it, then passing the generated plan to
--execute.

This isn't ideal, because it (from my understanding), potentially moves all
the partitions in the entire cluster around, but it should work fine, and
stop Kafka from having the partitions assigned to a broker that no longer
exists.

Am I missing something there? Or is this a reasonable workaround until
better partition reassignment tools turn up in the future?

Thanks

Tom


Re: Best way to commit offset on demand

2016-01-06 Thread Martin Skøtt
> in case we later changed the logic to only permit commits on assigned
partitions

I experienced this yesterday and was wondering why Kafka allows commits to
partitions from other consumers than the assigned one. Does any one know of
the reasoning behind this?

Martin
On 5 Jan 2016 18:29, "Jason Gustafson"  wrote:

> Yes, in this case you should use assign() instead of subscribe(). I'm not
> sure it's strictly necessary at the moment to use assign() in this case,
> but it would protect your code in case we later changed the logic to only
> permit commits on assigned partitions. It also doesn't really cost
> anything.
>
> -Jason
>
> On Mon, Jan 4, 2016 at 7:49 PM, tao xiao  wrote:
>
> > Thanks for the detailed explanation. 'technically commit offsets without
> > joining group'  I assume it means that I can call assign instead of
> > subscribe on consumer which bypasses joining process.
> >
> > The reason we put the reset offset outside of the consumer process is
> that
> > we can keep the consumer code as generic as possible since the offset
> reset
> > process is not needed for all consumer logics.
> >
> > On Tue, 5 Jan 2016 at 11:18 Jason Gustafson  wrote:
> >
> > > Ah, that makes sense if you have to wait to join the group. I think you
> > > could technically commit offsets without joining if you were sure that
> > the
> > > group was dead (i.e. all consumers had either left the group cleanly or
> > > their session timeout expired). But if there are still active members,
> > then
> > > yeah, you have to join the group. Clearly you have to be a little
> careful
> > > in this case if an active consumer is still trying to read data (it
> won't
> > > necessarily see the fresh offset commits and could even overwrite
> them),
> > > but I assume you're handling this.
> > >
> > > Creating a new instance each time you want to do this seems viable to
> me
> > > (and likely how we'd end up implementing the command line utility
> > anyway).
> > > The overhead is just a couple TCP connections. It's probably as good
> (or
> > as
> > > bad) as any other approach. The join latency seems unavoidable if you
> > can't
> > > be sure that the group is dead since we do not allow non-group members
> to
> > > commit offsets by design. Any tool we write will be up against the same
> > > restriction. We might be able to think of a way to bypass it, but that
> > > sounds dangerous.
> > >
> > > Out of curiosity, what's the advantage in your use case to setting
> > offsets
> > > out-of-band? I would probably consider options for moving it into the
> > > consumer process.
> > >
> > > -Jason
> > >
> > > On Mon, Jan 4, 2016 at 6:20 PM, tao xiao  wrote:
> > >
> > > > Jason,
> > > >
> > > > It normally takes a couple of seconds sometimes it takes longer to
> > join a
> > > > group if the consumer didn't shutdown gracefully previously.
> > > >
> > > > My use case is to have a command/tool to call to reset offset for a
> > list
> > > of
> > > > partitions and a particular consumer group before the consumer is
> > started
> > > > or wait until the offset reaches a given number before the consumer
> can
> > > be
> > > > closed. I think https://issues.apache.org/jira/browse/KAFKA-3059
> fits
> > my
> > > > use case. But for now I need to find out a workaround until this
> > feature
> > > is
> > > > implemented.
> > > >
> > > > For offset reset one way I can think of is to create a consumer with
> > the
> > > > same group id that I want to reset the offset for. Then commit the
> > offset
> > > > for the particular partitions and close the consumer. Is this
> solution
> > > > viable?
> > > >
> > > > On Tue, 5 Jan 2016 at 09:56 Jason Gustafson 
> > wrote:
> > > >
> > > > > Hey Tao,
> > > > >
> > > > > Interesting that you're seeing a lot of overhead constructing the
> new
> > > > > consumer instance each time. Granted it does have to fetch topic
> > > metadata
> > > > > and lookup the coordinator, but I wouldn't have expected that to
> be a
> > > big
> > > > > problem. How long is it typically taking?
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Mon, Jan 4, 2016 at 3:26 AM, Marko Bonaći <
> > > marko.bon...@sematext.com>
> > > > > wrote:
> > > > >
> > > > > > How are you consuming those topics?
> > > > > >
> > > > > > IF: I assume you have a consumer, so why not commit from within
> > that
> > > > > > consumer, after you process the message (whatever "process" means
> > to
> > > > > you).
> > > > > >
> > > > > > ELSE: couldn't you have a dedicated consumer for offset commit
> > > requests
> > > > > > that you don't shut down between requests?
> > > > > >
> > > > > > FINALLY: tell us more about your use case.
> > > > > >
> > > > > > Marko Bonaći
> > > > > > Monitoring | Alerting | Anomaly Detection | Centralized Log
> > > Management
> > > > > > Solr & Elasticsearch Support
> > > > > > Sematext  | Contact
> > > > > > 

Leader Election

2016-01-06 Thread Heath Ivie
Hi Folks,

I am trying to use the REST proxy, but I have some fundamental questions about 
how the leader election works.

My understanding of how the way the leader elections work is that the proxy 
hides all of that complexity and processes my produce/consume request 
transparently.

Is that the case or do I need to manage that myself?

Thanks
Heath



Warning: This e-mail may contain information proprietary to AutoAnything Inc. 
and is intended only for the use of the intended recipient(s). If the reader of 
this message is not the intended recipient(s), you have received this message 
in error and any review, dissemination, distribution or copying of this message 
is strictly prohibited. If you have received this message in error, please 
notify the sender immediately and delete all copies.


Re: Programmable API for Kafka Connect ?

2016-01-06 Thread Alex Loddengaard
Hi Shiti,

In the context of your question, there are three relevant components to
Connect: connectors, tasks, and workers.

Connectors do a number of things but at a high level they specify the tasks
(Java classes) to perform the work, how many to start, and some
coordination. The tasks do the actual work of reading/writing data to/from
Kafka (using the Connect API, not producer/consumer API). And the workers
are the daemons that run on one or many nodes that run the connectors and
tasks (in distributed mode).

The only way to start worker processes is through the command line --
bin/connect-distributed.sh.

The only way to add/remove/modify connectors in distributed mode is through
the REST API. I don't know of any API clients, unfortunately, but maybe one
exists that I don't know of? The REST API is started along with the worker
process, as part of bin/connect-distributed.sh. Meaning, you don't have to
start a separate REST process.

Let me know if you have any follow-up questions, Shiti.

Alex

On Tue, Jan 5, 2016 at 9:12 PM, Shiti Saxena  wrote:

> Hi,
>
> Does Kafka Connect have an API which can be used by applications to start
> Kafka Connect, add/remove Connectors?
>
> I also do not want to use the REST API and do not want to start the REST
> server.
>
> Thanks,
> Shiti
>



-- 
*Alex Loddengaard | **Solutions Architect | Confluent*
*Download Apache Kafka and Confluent Platform: www.confluent.io/download
*


Re: Leader Election

2016-01-06 Thread Alex Loddengaard
Hi Heath,

I assume you're referring to the partition leader. This is not something
you need to worry about when using the REST API. Kafka handles leader
election, failure recovery, etc. for you behind the scenes. Do know that if
a leader fails, you'll experience a small latency hit because a new leader
needs to be elected for the partitions that were led by said failed leader.

Alex

On Wed, Jan 6, 2016 at 8:09 AM, Heath Ivie  wrote:

> Hi Folks,
>
> I am trying to use the REST proxy, but I have some fundamental questions
> about how the leader election works.
>
> My understanding of how the way the leader elections work is that the
> proxy hides all of that complexity and processes my produce/consume request
> transparently.
>
> Is that the case or do I need to manage that myself?
>
> Thanks
> Heath
>
>
>
> Warning: This e-mail may contain information proprietary to AutoAnything
> Inc. and is intended only for the use of the intended recipient(s). If the
> reader of this message is not the intended recipient(s), you have received
> this message in error and any review, dissemination, distribution or
> copying of this message is strictly prohibited. If you have received this
> message in error, please notify the sender immediately and delete all
> copies.
>



-- 
*Alex Loddengaard | **Solutions Architect | Confluent*
*Download Apache Kafka and Confluent Platform: www.confluent.io/download
*


Security in Kafka

2016-01-06 Thread Mohit Anchlia
In 0.9 release it's not clear if Security features of LDAP authentication
and authorization are available? If authN and authZ are available can
somebody point me to relevant documentation that shows how to configure
Kafka to enable authN and authZ?


0.9 consumer reading a range of log messages

2016-01-06 Thread Rajiv Kurian
I want to use the new 0.9 consumer for a particular application.

My use case is the following:

i) The TopicPartition I need to poll has a short log say 10 mins odd
(log.retention.minutes is set to 10).

ii) I don't use a consumer group i.e. I manage the partition assignment
myself.

iii) Whenever I get a new partition assigned to one of my processes
(through my own mechanism), I want to query for the current end of the log
and then seek to the beginning of the log. I want to continue reading in a
straight line till my offset moves from the beginning to the end that I
queried before beginning to poll. When I am done reading this much data (I
know the end has moved by the time I've read all of it) I consider myself
caught up. Note: I only need to do the seek to the beginning of the log,
which the new consumer allows one to do. I just need to know the end of log
offset so that I know that I have "caught up".

So questions I have are:

i) How do I get the last log offset from the Kafka consumer? The
SimpleConsumer had a way to get this information. If I can get this info
from the consumer, I plan to do something like this:


private boolean assignNewPartitionAndCatchUp(int newPartition) {

final TopicPartition newTopicPartition = new TopicPartition(myTopic,
newPartition);

   // Queries the existing partitions and adds this TopicPartition to
the list.

List newAssignment =
createNewAssignmentByAddingPartition(

newTopicPartition);

consumer.assign(newAssignment);


// How do I actually do this with the consumer?

final long lastMessageOffset = getLastMessageOffset(
newTopicPartition);


consumer.seekToBeginning(newTopicPartition);

final long timeout = 100;

int numIterations = 0;

   final boolean caughtUp = false;

while (!caughtUp && numIterations < maxIterations) {

ConsumerRecords records = consumer.poll(timeout);

numIterations += 1;

for (ConsumerRecord record : records) {

   //  All messages are processed regularly even if they belong
to other partitions.

processRecord(record.value());

final int partition = record.partition();

final long offset = record.offset();

// Only if we find that the new partition has caught up do
we return.

if (partition == newPartition && offset >= lastMessageOffset)
{

caughtUp = true;

}

}

}

return caughtUp;

}

ii) If I ask the consumer to seek to the beginning via the  consumer
.seekToBeginning(newTopicPartition) call, will it handle the case where the
log has rolled over in the meanwhile and what was considered the beginning
offset is no longer present.  Given my log retention is only 10 minutes and
the partitions will each get quite a bit of traffic, I'd imagine that
messages will fall out of the log quite often.

iii) What settings do I need on the Kafka broker (besides
log.retention.minutes = 10) to ensure that my partitions don't retain any
more than 10 minutes of data (plus a couple minutes is fine). Do I need to
tune how often Kafka checks for log deletion eligibility? Any other
settings I should play with to ensure timely deletion?

Thanks,
Rajiv


Re: Migrate a topic which has no brokers

2016-01-06 Thread Ben Davison
Hi Gwen,

Thanks for the suggestion, we are using Kafka 0.9's "auto broker ID"
feature, is there a way to force a running kafka instance to take on a new
broker ID?

Thanks,

Ben

On Tue, Jan 5, 2016 at 6:19 PM, Gwen Shapira  wrote:

> Stevo pointed you at the correct document for moving topics around.
> However, if you lost a broker, by far the easiest way to recover is to
> start a new broker and give it the same ID as the one that went down.
>
>
>
> On Tue, Jan 5, 2016 at 8:49 AM, Stevo Slavić  wrote:
>
> > Hello Ben,
> >
> > Yes, you can use apply different replica assignment. See related docs:
> >
> >
> http://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> >
> > Kind regards,
> > Stevo Slavic.
> >
> > On Tue, Jan 5, 2016 at 5:32 PM, Ben Davison 
> > wrote:
> >
> > > Hi All,
> > >
> > > On our dev kafka environment our brokers went down with a topic on it,
> > can
> > > we just reassign the partitions to another broker?
> > >
> > > Kafka 0.9
> > >
> > > Thanks
> > >
> > > Ben
> > >
> > > --
> > >
> > >
> > > This email, including attachments, is private and confidential. If you
> > have
> > > received this email in error please notify the sender and delete it
> from
> > > your system. Emails are not secure and may contain viruses. No
> liability
> > > can be accepted for viruses that might be transferred by this email or
> > any
> > > attachment. Any unauthorised copying of this message or unauthorised
> > > distribution and publication of the information contained herein are
> > > prohibited.
> > >
> > > 7digital Limited. Registered office: 69 Wilson Street, London EC2A 2BB.
> > > Registered in England and Wales. Registered No. 04843573.
> > >
> >
>

-- 


This email, including attachments, is private and confidential. If you have 
received this email in error please notify the sender and delete it from 
your system. Emails are not secure and may contain viruses. No liability 
can be accepted for viruses that might be transferred by this email or any 
attachment. Any unauthorised copying of this message or unauthorised 
distribution and publication of the information contained herein are 
prohibited.

7digital Limited. Registered office: 69 Wilson Street, London EC2A 2BB.
Registered in England and Wales. Registered No. 04843573.


Re: 0.9 consumer reading a range of log messages

2016-01-06 Thread Rajiv Kurian
Thanks or the replies Jason!

A couple of follow up questions:
i. If I call seekToEnd() on the consumer and then use position() is the
position() call making a blocking IO call. It is not entirely clear from
the documentation whether this will just block or not.

ii. If the position() call does indeed block are there any consequences in
terms of me not calling poll often enough? Is there a way for me to limit
how long I allow it to block, maybe by passing a timeout parameter. I only
use manual assignments so I am hoping that there is no consequence of
infrequent heart beats etc through poll starvation.

Thanks,
Rajiv



On Wed, Jan 6, 2016 at 1:58 PM, Jason Gustafson  wrote:

> Hi Rajiv,
>
> Answers below:
>
> i) How do I get the last log offset from the Kafka consumer?
>
>
> To get the last offset, first call seekToEnd() and then use position().
>
> ii) If I ask the consumer to seek to the beginning via the  consumer
> > .seekToBeginning(newTopicPartition) call, will it handle the case where
> > the
> > log has rolled over in the meanwhile and what was considered the
> beginning
> > offset is no longer present?
>
>
> The call to seekToBeginning() only sets a flag indicating that a reset is
> needed. The actual position will not be fetched until you call poll() or
> position(). This means that the window for an out of range offset should be
> small, but of course it could happen. The behavior of the consumer when an
> offset is out of range is controlled with the "auto.offset.reset"
> configuration. If you use the "earliest" policy, then the consumer will
> automatically reset the position to whatever the current earliest offset
> is. You might also choose to use no automatic reset policy by specifying
> "none." In this case, poll() will throw an OffsetOutOfRangeException, which
> you can catch and manually re-seek to the beginning.
>
> iii) What settings do I need on the Kafka broker (besides
> > log.retention.minutes = 10) to ensure that my partitions don't retain any
> > more than 10 minutes of data (plus a couple minutes is fine). Do I need
> to
> > tune how often Kafka checks for log deletion eligibility? Any other
> > settings I should play with to ensure timely deletion?
>
>
> Log retention is currently at the granularity of log segments. This means
> that you cannot generally guarantee that messages will be deleted within
> the configured retention time. However, you can control the segment size
> using "log.segment.bytes" and the delay before deletion with "
> log.segment.delete.delay.ms." If you can estimate the incoming message
> rate, then you can probably tune these settings to get a retention policy
> closer to what you're looking for. See here for more info on broker
> configuration: https://kafka.apache.org/documentation.html#brokerconfigs.
> And for what it's worth, KIP-32, which adds a timestamp to each message,
> should provide some better options for handling this.
>
> -Jason
>
> On Wed, Jan 6, 2016 at 9:37 AM, Rajiv Kurian  wrote:
>
> > I want to use the new 0.9 consumer for a particular application.
> >
> > My use case is the following:
> >
> > i) The TopicPartition I need to poll has a short log say 10 mins odd
> > (log.retention.minutes is set to 10).
> >
> > ii) I don't use a consumer group i.e. I manage the partition assignment
> > myself.
> >
> > iii) Whenever I get a new partition assigned to one of my processes
> > (through my own mechanism), I want to query for the current end of the
> log
> > and then seek to the beginning of the log. I want to continue reading in
> a
> > straight line till my offset moves from the beginning to the end that I
> > queried before beginning to poll. When I am done reading this much data
> (I
> > know the end has moved by the time I've read all of it) I consider myself
> > caught up. Note: I only need to do the seek to the beginning of the log,
> > which the new consumer allows one to do. I just need to know the end of
> log
> > offset so that I know that I have "caught up".
> >
> > So questions I have are:
> >
> > i) How do I get the last log offset from the Kafka consumer? The
> > SimpleConsumer had a way to get this information. If I can get this info
> > from the consumer, I plan to do something like this:
> >
> >
> > private boolean assignNewPartitionAndCatchUp(int newPartition) {
> >
> > final TopicPartition newTopicPartition = new
> > TopicPartition(myTopic,
> > newPartition);
> >
> >// Queries the existing partitions and adds this TopicPartition to
> > the list.
> >
> > List newAssignment =
> > createNewAssignmentByAddingPartition(
> >
> > newTopicPartition);
> >
> > consumer.assign(newAssignment);
> >
> >
> > // How do I actually do this with the consumer?
> >
> > final long lastMessageOffset = getLastMessageOffset(
> > newTopicPartition);
> >
> >
> > consumer.seekToBeginning(newTopicPartition);
> >
> > final long 

Re: Migrate a topic which has no brokers

2016-01-06 Thread Gwen Shapira
Hey,

You can modify a broker ID by taking it down, changing the id configuration
in the properties file and starting it.
Note that the broker will lose its existing partitions and take the
partitions of the broker it is replacing (i.e. a broker can't have two
identities at the same time), so you can do this only on brokers that
started but don't have data yet.

Gwen

On Jan 6, 2016 8:25 AM, "Ben Davison"  wrote:

> Hi Gwen,
>
> Thanks for the suggestion, we are using Kafka 0.9's "auto broker ID"
> feature, is there a way to force a running kafka instance to take on a new
> broker ID?
>
> Thanks,
>
> Ben
>
> On Tue, Jan 5, 2016 at 6:19 PM, Gwen Shapira  wrote:
>
> > Stevo pointed you at the correct document for moving topics around.
> > However, if you lost a broker, by far the easiest way to recover is to
> > start a new broker and give it the same ID as the one that went down.
> >
> >
> >
> > On Tue, Jan 5, 2016 at 8:49 AM, Stevo Slavić  wrote:
> >
> > > Hello Ben,
> > >
> > > Yes, you can use apply different replica assignment. See related docs:
> > >
> > >
> >
> http://kafka.apache.org/documentation.html#basic_ops_increase_replication_factor
> > >
> > > Kind regards,
> > > Stevo Slavic.
> > >
> > > On Tue, Jan 5, 2016 at 5:32 PM, Ben Davison 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > On our dev kafka environment our brokers went down with a topic on
> it,
> > > can
> > > > we just reassign the partitions to another broker?
> > > >
> > > > Kafka 0.9
> > > >
> > > > Thanks
> > > >
> > > > Ben
> > > >
> > > > --
> > > >
> > > >
> > > > This email, including attachments, is private and confidential. If
> you
> > > have
> > > > received this email in error please notify the sender and delete it
> > from
> > > > your system. Emails are not secure and may contain viruses. No
> > liability
> > > > can be accepted for viruses that might be transferred by this email
> or
> > > any
> > > > attachment. Any unauthorised copying of this message or unauthorised
> > > > distribution and publication of the information contained herein are
> > > > prohibited.
> > > >
> > > > 7digital Limited. Registered office: 69 Wilson Street, London EC2A
> 2BB.
> > > > Registered in England and Wales. Registered No. 04843573.
> > > >
> > >
> >
>
> --
>
>
> This email, including attachments, is private and confidential. If you have
> received this email in error please notify the sender and delete it from
> your system. Emails are not secure and may contain viruses. No liability
> can be accepted for viruses that might be transferred by this email or any
> attachment. Any unauthorised copying of this message or unauthorised
> distribution and publication of the information contained herein are
> prohibited.
>
> 7digital Limited. Registered office: 69 Wilson Street, London EC2A 2BB.
> Registered in England and Wales. Registered No. 04843573.
>


Re: 0.9 consumer reading a range of log messages

2016-01-06 Thread Jason Gustafson
Hi Rajiv,

Answers below:

i) How do I get the last log offset from the Kafka consumer?


To get the last offset, first call seekToEnd() and then use position().

ii) If I ask the consumer to seek to the beginning via the  consumer
> .seekToBeginning(newTopicPartition) call, will it handle the case where
> the
> log has rolled over in the meanwhile and what was considered the beginning
> offset is no longer present?


The call to seekToBeginning() only sets a flag indicating that a reset is
needed. The actual position will not be fetched until you call poll() or
position(). This means that the window for an out of range offset should be
small, but of course it could happen. The behavior of the consumer when an
offset is out of range is controlled with the "auto.offset.reset"
configuration. If you use the "earliest" policy, then the consumer will
automatically reset the position to whatever the current earliest offset
is. You might also choose to use no automatic reset policy by specifying
"none." In this case, poll() will throw an OffsetOutOfRangeException, which
you can catch and manually re-seek to the beginning.

iii) What settings do I need on the Kafka broker (besides
> log.retention.minutes = 10) to ensure that my partitions don't retain any
> more than 10 minutes of data (plus a couple minutes is fine). Do I need to
> tune how often Kafka checks for log deletion eligibility? Any other
> settings I should play with to ensure timely deletion?


Log retention is currently at the granularity of log segments. This means
that you cannot generally guarantee that messages will be deleted within
the configured retention time. However, you can control the segment size
using "log.segment.bytes" and the delay before deletion with "
log.segment.delete.delay.ms." If you can estimate the incoming message
rate, then you can probably tune these settings to get a retention policy
closer to what you're looking for. See here for more info on broker
configuration: https://kafka.apache.org/documentation.html#brokerconfigs.
And for what it's worth, KIP-32, which adds a timestamp to each message,
should provide some better options for handling this.

-Jason

On Wed, Jan 6, 2016 at 9:37 AM, Rajiv Kurian  wrote:

> I want to use the new 0.9 consumer for a particular application.
>
> My use case is the following:
>
> i) The TopicPartition I need to poll has a short log say 10 mins odd
> (log.retention.minutes is set to 10).
>
> ii) I don't use a consumer group i.e. I manage the partition assignment
> myself.
>
> iii) Whenever I get a new partition assigned to one of my processes
> (through my own mechanism), I want to query for the current end of the log
> and then seek to the beginning of the log. I want to continue reading in a
> straight line till my offset moves from the beginning to the end that I
> queried before beginning to poll. When I am done reading this much data (I
> know the end has moved by the time I've read all of it) I consider myself
> caught up. Note: I only need to do the seek to the beginning of the log,
> which the new consumer allows one to do. I just need to know the end of log
> offset so that I know that I have "caught up".
>
> So questions I have are:
>
> i) How do I get the last log offset from the Kafka consumer? The
> SimpleConsumer had a way to get this information. If I can get this info
> from the consumer, I plan to do something like this:
>
>
> private boolean assignNewPartitionAndCatchUp(int newPartition) {
>
> final TopicPartition newTopicPartition = new
> TopicPartition(myTopic,
> newPartition);
>
>// Queries the existing partitions and adds this TopicPartition to
> the list.
>
> List newAssignment =
> createNewAssignmentByAddingPartition(
>
> newTopicPartition);
>
> consumer.assign(newAssignment);
>
>
> // How do I actually do this with the consumer?
>
> final long lastMessageOffset = getLastMessageOffset(
> newTopicPartition);
>
>
> consumer.seekToBeginning(newTopicPartition);
>
> final long timeout = 100;
>
> int numIterations = 0;
>
>final boolean caughtUp = false;
>
> while (!caughtUp && numIterations < maxIterations) {
>
> ConsumerRecords records = consumer.poll(timeout);
>
> numIterations += 1;
>
> for (ConsumerRecord record : records) {
>
>//  All messages are processed regularly even if they belong
> to other partitions.
>
> processRecord(record.value());
>
> final int partition = record.partition();
>
> final long offset = record.offset();
>
> // Only if we find that the new partition has caught up do
> we return.
>
> if (partition == newPartition && offset >=
> lastMessageOffset)
> {
>
> caughtUp = true;
>
> }
>
> }
>
> }
>
> return 

Re: Security in Kafka

2016-01-06 Thread Jun Rao
Mohit,

In 0.9, Kafka supports Kerberos. So, you can authenticate with any
directory service that supports Kerberos (e.g., Active Directory). Our
default authorization is based on individual users, not on groups though.

You can find more a bit more info on security in the following links.

http://kafka.apache.org/090/documentation.html#security_sasl
http://docs.confluent.io/2.0.0/kafka/sasl.html

Thanks,

Jun

On Wed, Jan 6, 2016 at 9:30 AM, Mohit Anchlia 
wrote:

> In 0.9 release it's not clear if Security features of LDAP authentication
> and authorization are available? If authN and authZ are available can
> somebody point me to relevant documentation that shows how to configure
> Kafka to enable authN and authZ?
>


Re: 0.9 consumer reading a range of log messages

2016-01-06 Thread Rajiv Kurian
On Wed, Jan 6, 2016 at 5:31 PM, Jason Gustafson  wrote:

> >
> > i. If I call seekToEnd() on the consumer and then use position() is the
> > position() call making a blocking IO call. It is not entirely clear from
> > the documentation whether this will just block or not.
> >
>
> Yes, position() will block when the offset needs to be reset. You can
> depend on it returning the right offset after a call to seekToEnd().
>
> ii. If the position() call does indeed block are there any consequences in
> > terms of me not calling poll often enough?
>
>
> Since you are using manual assignment, there should be no consequence to
> not calling poll() frequently enough. The heartbeats are only needed when
> the consumer is using automatic assignment. Unfortunately, there is no way
> to set a timeout for this call at the moment. In the worst case, if the
> partition leader is unavailable, it could block indefinitely. We've
> considered several times adding a "max.block.ms" configuration setting
> which would work just as it does for the producer, but we wanted to gauge
> the level of interest before adding yet another setting. Typically we would
> expect the partition leader to become available "soon" and the thought was
> that users would generally just retry anyway.
>
I think a timeout setting for all blocking calls would be very useful.
Given that the subscriber is going to be called from a single thread any
blocking call can starve other multiplexed partitions even if their brokers
are fine. So this could lead to one down broker causing the entire consumer
to come to a grind. A user might instead decided to give up after some
timeout and do something more meaningful than blocking the entire thread.

>
> -Jason
>
> On Wed, Jan 6, 2016 at 4:31 PM, Rajiv Kurian  wrote:
>
> > Thanks or the replies Jason!
> >
> > A couple of follow up questions:
> > i. If I call seekToEnd() on the consumer and then use position() is the
> > position() call making a blocking IO call. It is not entirely clear from
> > the documentation whether this will just block or not.
> >
> > ii. If the position() call does indeed block are there any consequences
> in
> > terms of me not calling poll often enough? Is there a way for me to limit
> > how long I allow it to block, maybe by passing a timeout parameter. I
> only
> > use manual assignments so I am hoping that there is no consequence of
> > infrequent heart beats etc through poll starvation.
> >
> > Thanks,
> > Rajiv
> >
> >
> >
> > On Wed, Jan 6, 2016 at 1:58 PM, Jason Gustafson 
> > wrote:
> >
> > > Hi Rajiv,
> > >
> > > Answers below:
> > >
> > > i) How do I get the last log offset from the Kafka consumer?
> > >
> > >
> > > To get the last offset, first call seekToEnd() and then use position().
> > >
> > > ii) If I ask the consumer to seek to the beginning via the  consumer
> > > > .seekToBeginning(newTopicPartition) call, will it handle the case
> where
> > > > the
> > > > log has rolled over in the meanwhile and what was considered the
> > > beginning
> > > > offset is no longer present?
> > >
> > >
> > > The call to seekToBeginning() only sets a flag indicating that a reset
> is
> > > needed. The actual position will not be fetched until you call poll()
> or
> > > position(). This means that the window for an out of range offset
> should
> > be
> > > small, but of course it could happen. The behavior of the consumer when
> > an
> > > offset is out of range is controlled with the "auto.offset.reset"
> > > configuration. If you use the "earliest" policy, then the consumer will
> > > automatically reset the position to whatever the current earliest
> offset
> > > is. You might also choose to use no automatic reset policy by
> specifying
> > > "none." In this case, poll() will throw an OffsetOutOfRangeException,
> > which
> > > you can catch and manually re-seek to the beginning.
> > >
> > > iii) What settings do I need on the Kafka broker (besides
> > > > log.retention.minutes = 10) to ensure that my partitions don't retain
> > any
> > > > more than 10 minutes of data (plus a couple minutes is fine). Do I
> need
> > > to
> > > > tune how often Kafka checks for log deletion eligibility? Any other
> > > > settings I should play with to ensure timely deletion?
> > >
> > >
> > > Log retention is currently at the granularity of log segments. This
> means
> > > that you cannot generally guarantee that messages will be deleted
> within
> > > the configured retention time. However, you can control the segment
> size
> > > using "log.segment.bytes" and the delay before deletion with "
> > > log.segment.delete.delay.ms." If you can estimate the incoming message
> > > rate, then you can probably tune these settings to get a retention
> policy
> > > closer to what you're looking for. See here for more info on broker
> > > configuration:
> https://kafka.apache.org/documentation.html#brokerconfigs
> > .
> > > And for what it's worth, KIP-32, 

Re: 0.9 consumer reading a range of log messages

2016-01-06 Thread Jason Gustafson
>
> i. If I call seekToEnd() on the consumer and then use position() is the
> position() call making a blocking IO call. It is not entirely clear from
> the documentation whether this will just block or not.
>

Yes, position() will block when the offset needs to be reset. You can
depend on it returning the right offset after a call to seekToEnd().

ii. If the position() call does indeed block are there any consequences in
> terms of me not calling poll often enough?


Since you are using manual assignment, there should be no consequence to
not calling poll() frequently enough. The heartbeats are only needed when
the consumer is using automatic assignment. Unfortunately, there is no way
to set a timeout for this call at the moment. In the worst case, if the
partition leader is unavailable, it could block indefinitely. We've
considered several times adding a "max.block.ms" configuration setting
which would work just as it does for the producer, but we wanted to gauge
the level of interest before adding yet another setting. Typically we would
expect the partition leader to become available "soon" and the thought was
that users would generally just retry anyway.

-Jason

On Wed, Jan 6, 2016 at 4:31 PM, Rajiv Kurian  wrote:

> Thanks or the replies Jason!
>
> A couple of follow up questions:
> i. If I call seekToEnd() on the consumer and then use position() is the
> position() call making a blocking IO call. It is not entirely clear from
> the documentation whether this will just block or not.
>
> ii. If the position() call does indeed block are there any consequences in
> terms of me not calling poll often enough? Is there a way for me to limit
> how long I allow it to block, maybe by passing a timeout parameter. I only
> use manual assignments so I am hoping that there is no consequence of
> infrequent heart beats etc through poll starvation.
>
> Thanks,
> Rajiv
>
>
>
> On Wed, Jan 6, 2016 at 1:58 PM, Jason Gustafson 
> wrote:
>
> > Hi Rajiv,
> >
> > Answers below:
> >
> > i) How do I get the last log offset from the Kafka consumer?
> >
> >
> > To get the last offset, first call seekToEnd() and then use position().
> >
> > ii) If I ask the consumer to seek to the beginning via the  consumer
> > > .seekToBeginning(newTopicPartition) call, will it handle the case where
> > > the
> > > log has rolled over in the meanwhile and what was considered the
> > beginning
> > > offset is no longer present?
> >
> >
> > The call to seekToBeginning() only sets a flag indicating that a reset is
> > needed. The actual position will not be fetched until you call poll() or
> > position(). This means that the window for an out of range offset should
> be
> > small, but of course it could happen. The behavior of the consumer when
> an
> > offset is out of range is controlled with the "auto.offset.reset"
> > configuration. If you use the "earliest" policy, then the consumer will
> > automatically reset the position to whatever the current earliest offset
> > is. You might also choose to use no automatic reset policy by specifying
> > "none." In this case, poll() will throw an OffsetOutOfRangeException,
> which
> > you can catch and manually re-seek to the beginning.
> >
> > iii) What settings do I need on the Kafka broker (besides
> > > log.retention.minutes = 10) to ensure that my partitions don't retain
> any
> > > more than 10 minutes of data (plus a couple minutes is fine). Do I need
> > to
> > > tune how often Kafka checks for log deletion eligibility? Any other
> > > settings I should play with to ensure timely deletion?
> >
> >
> > Log retention is currently at the granularity of log segments. This means
> > that you cannot generally guarantee that messages will be deleted within
> > the configured retention time. However, you can control the segment size
> > using "log.segment.bytes" and the delay before deletion with "
> > log.segment.delete.delay.ms." If you can estimate the incoming message
> > rate, then you can probably tune these settings to get a retention policy
> > closer to what you're looking for. See here for more info on broker
> > configuration: https://kafka.apache.org/documentation.html#brokerconfigs
> .
> > And for what it's worth, KIP-32, which adds a timestamp to each message,
> > should provide some better options for handling this.
> >
> > -Jason
> >
> > On Wed, Jan 6, 2016 at 9:37 AM, Rajiv Kurian  wrote:
> >
> > > I want to use the new 0.9 consumer for a particular application.
> > >
> > > My use case is the following:
> > >
> > > i) The TopicPartition I need to poll has a short log say 10 mins odd
> > > (log.retention.minutes is set to 10).
> > >
> > > ii) I don't use a consumer group i.e. I manage the partition assignment
> > > myself.
> > >
> > > iii) Whenever I get a new partition assigned to one of my processes
> > > (through my own mechanism), I want to query for the current end of the
> > log
> > > and then seek to the beginning of 

Re: Best way to commit offset on demand

2016-01-06 Thread Jason Gustafson
Not sure there's a great reason. In the initial design, the server itself
only permitted commits from consumers that were assigned the respective
partitions, but we lost this when we generalized the group coordination
protocol. It seems like it still makes sense to do it on the client though,
so it's possible that this was just forgotten in all the noise. I'll open a
JIRA and see what others think.

-Jason

On Wed, Jan 6, 2016 at 6:49 AM, Martin Skøtt  wrote:

> > in case we later changed the logic to only permit commits on assigned
> partitions
>
> I experienced this yesterday and was wondering why Kafka allows commits to
> partitions from other consumers than the assigned one. Does any one know of
> the reasoning behind this?
>
> Martin
> On 5 Jan 2016 18:29, "Jason Gustafson"  wrote:
>
> > Yes, in this case you should use assign() instead of subscribe(). I'm not
> > sure it's strictly necessary at the moment to use assign() in this case,
> > but it would protect your code in case we later changed the logic to only
> > permit commits on assigned partitions. It also doesn't really cost
> > anything.
> >
> > -Jason
> >
> > On Mon, Jan 4, 2016 at 7:49 PM, tao xiao  wrote:
> >
> > > Thanks for the detailed explanation. 'technically commit offsets
> without
> > > joining group'  I assume it means that I can call assign instead of
> > > subscribe on consumer which bypasses joining process.
> > >
> > > The reason we put the reset offset outside of the consumer process is
> > that
> > > we can keep the consumer code as generic as possible since the offset
> > reset
> > > process is not needed for all consumer logics.
> > >
> > > On Tue, 5 Jan 2016 at 11:18 Jason Gustafson 
> wrote:
> > >
> > > > Ah, that makes sense if you have to wait to join the group. I think
> you
> > > > could technically commit offsets without joining if you were sure
> that
> > > the
> > > > group was dead (i.e. all consumers had either left the group cleanly
> or
> > > > their session timeout expired). But if there are still active
> members,
> > > then
> > > > yeah, you have to join the group. Clearly you have to be a little
> > careful
> > > > in this case if an active consumer is still trying to read data (it
> > won't
> > > > necessarily see the fresh offset commits and could even overwrite
> > them),
> > > > but I assume you're handling this.
> > > >
> > > > Creating a new instance each time you want to do this seems viable to
> > me
> > > > (and likely how we'd end up implementing the command line utility
> > > anyway).
> > > > The overhead is just a couple TCP connections. It's probably as good
> > (or
> > > as
> > > > bad) as any other approach. The join latency seems unavoidable if you
> > > can't
> > > > be sure that the group is dead since we do not allow non-group
> members
> > to
> > > > commit offsets by design. Any tool we write will be up against the
> same
> > > > restriction. We might be able to think of a way to bypass it, but
> that
> > > > sounds dangerous.
> > > >
> > > > Out of curiosity, what's the advantage in your use case to setting
> > > offsets
> > > > out-of-band? I would probably consider options for moving it into the
> > > > consumer process.
> > > >
> > > > -Jason
> > > >
> > > > On Mon, Jan 4, 2016 at 6:20 PM, tao xiao 
> wrote:
> > > >
> > > > > Jason,
> > > > >
> > > > > It normally takes a couple of seconds sometimes it takes longer to
> > > join a
> > > > > group if the consumer didn't shutdown gracefully previously.
> > > > >
> > > > > My use case is to have a command/tool to call to reset offset for a
> > > list
> > > > of
> > > > > partitions and a particular consumer group before the consumer is
> > > started
> > > > > or wait until the offset reaches a given number before the consumer
> > can
> > > > be
> > > > > closed. I think https://issues.apache.org/jira/browse/KAFKA-3059
> > fits
> > > my
> > > > > use case. But for now I need to find out a workaround until this
> > > feature
> > > > is
> > > > > implemented.
> > > > >
> > > > > For offset reset one way I can think of is to create a consumer
> with
> > > the
> > > > > same group id that I want to reset the offset for. Then commit the
> > > offset
> > > > > for the particular partitions and close the consumer. Is this
> > solution
> > > > > viable?
> > > > >
> > > > > On Tue, 5 Jan 2016 at 09:56 Jason Gustafson 
> > > wrote:
> > > > >
> > > > > > Hey Tao,
> > > > > >
> > > > > > Interesting that you're seeing a lot of overhead constructing the
> > new
> > > > > > consumer instance each time. Granted it does have to fetch topic
> > > > metadata
> > > > > > and lookup the coordinator, but I wouldn't have expected that to
> > be a
> > > > big
> > > > > > problem. How long is it typically taking?
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Mon, Jan 4, 2016 at 3:26 AM, Marko Bonaći <
> > > >