Re: Figuring out lag within Java consumer application

2017-10-12 Thread Stephen Powis
So I have the same use case as the original poster and had the same issue
with the older 0.10.x clients and not being able to determine the tail
offsets even tho the fetch response contains the HW mark.

>From what I could understand by tracing through the 0.11.0 consumer code,
it makes additional API/network calls to the kafka cluster to retrieve the
tail/end offsets information.  Assuming I haven't mis-read/mis-understood
the code, for most use cases this probably makes sense.  But in time
sensitive code, it bummed me out to have to make additional calls to get
that information when technically it's already available via the HW
property in the fetches, just the consumer has no access to it.

Is there any talk about exposing this property somewhere in the consumers
in the future?

On Fri, Oct 13, 2017 at 8:35 AM, Manan G  wrote:

> NM. 0.11 KafkaConsumer seems to have added "endOffsets" API!
>
> On Thu, Oct 12, 2017 at 3:31 PM, Manan G  wrote:
>
> > For my use case, I need to figure out the lag within the Java consumer
> > itself that is consuming some topic. Ideally, the consumer application
> > would monitor the lag every minute or so and take some action on its own
> if
> > consumer falls behind (i.e. spin up more threads to process records - my
> > use case does not care about record order). For our purpose, it is OK if
> > lag information is bit stale.
> >
> > * AFAIK, Java KafkaConsumer APIs do not seem to expose any information
> > directly based on which I can figure out the lag within my Java consumer
> > application.
> >
> > * It seems that, alternatively, I can create separate new KafkaConsumer
> > (or possibly use existing KafkaConsumer my consumer application is
> using),
> > seek to end, and call "position()" API to figure out the end offsets for
> > all partitions I am interested in. Since within my consumer application,
> I
> > already know which offset consumer is at, I can figure out the lag.
> > However, for this one basic information, this solution is bit more
> involved
> > for our framework for various reasons. Also, it requires one to either
> use
> > separate KafkaConsumer just to figure out lag or possibly re-use same
> > KafkaConsumer our consumer application is using, but somehow implement
> the
> > logic of forwarding to end to find out end offset and resetting back (not
> > sure if it's feasible yet without issues).
> >
> > * If I am not mistaken, looking at KafkaConsumer code itself, "high
> > watermark" information seems to be already there in FetchResponse
> > (FetchResponse.PartitionData). It's just that it is not exposed. Is there
> > any way to retrieve this information somehow in my consumer application?
> >
> > In general, wouldn't it be useful for consumer application to have lag
> > information available with simple API call?
> >
> > Thanks,
> > M
> >
> >
> >
>


Re: Figuring out lag within Java consumer application

2017-10-12 Thread Manan G
NM. 0.11 KafkaConsumer seems to have added "endOffsets" API!

On Thu, Oct 12, 2017 at 3:31 PM, Manan G  wrote:

> For my use case, I need to figure out the lag within the Java consumer
> itself that is consuming some topic. Ideally, the consumer application
> would monitor the lag every minute or so and take some action on its own if
> consumer falls behind (i.e. spin up more threads to process records - my
> use case does not care about record order). For our purpose, it is OK if
> lag information is bit stale.
>
> * AFAIK, Java KafkaConsumer APIs do not seem to expose any information
> directly based on which I can figure out the lag within my Java consumer
> application.
>
> * It seems that, alternatively, I can create separate new KafkaConsumer
> (or possibly use existing KafkaConsumer my consumer application is using),
> seek to end, and call "position()" API to figure out the end offsets for
> all partitions I am interested in. Since within my consumer application, I
> already know which offset consumer is at, I can figure out the lag.
> However, for this one basic information, this solution is bit more involved
> for our framework for various reasons. Also, it requires one to either use
> separate KafkaConsumer just to figure out lag or possibly re-use same
> KafkaConsumer our consumer application is using, but somehow implement the
> logic of forwarding to end to find out end offset and resetting back (not
> sure if it's feasible yet without issues).
>
> * If I am not mistaken, looking at KafkaConsumer code itself, "high
> watermark" information seems to be already there in FetchResponse
> (FetchResponse.PartitionData). It's just that it is not exposed. Is there
> any way to retrieve this information somehow in my consumer application?
>
> In general, wouldn't it be useful for consumer application to have lag
> information available with simple API call?
>
> Thanks,
> M
>
>
>


Backfill historical data

2017-10-12 Thread Riccardo Ferrari
Hi list,

I need to backfill a kafka-0.10.1 topic with some historical data (time
series) coming from a DB.

How can I populate data and still leverage on the self-expiring (retention)
capabilities?

Is there a best practice to bulk load historical data?

Thanks,


Figuring out lag within Java consumer application

2017-10-12 Thread Manan G
For my use case, I need to figure out the lag within the Java consumer
itself that is consuming some topic. Ideally, the consumer application
would monitor the lag every minute or so and take some action on its own if
consumer falls behind (i.e. spin up more threads to process records - my
use case does not care about record order). For our purpose, it is OK if
lag information is bit stale.

* AFAIK, Java KafkaConsumer APIs do not seem to expose any information
directly based on which I can figure out the lag within my Java consumer
application.

* It seems that, alternatively, I can create separate new KafkaConsumer (or
possibly use existing KafkaConsumer my consumer application is using), seek
to end, and call "position()" API to figure out the end offsets for all
partitions I am interested in. Since within my consumer application, I
already know which offset consumer is at, I can figure out the lag.
However, for this one basic information, this solution is bit more involved
for our framework for various reasons. Also, it requires one to either use
separate KafkaConsumer just to figure out lag or possibly re-use same
KafkaConsumer our consumer application is using, but somehow implement the
logic of forwarding to end to find out end offset and resetting back (not
sure if it's feasible yet without issues).

* If I am not mistaken, looking at KafkaConsumer code itself, "high
watermark" information seems to be already there in FetchResponse
(FetchResponse.PartitionData). It's just that it is not exposed. Is there
any way to retrieve this information somehow in my consumer application?

In general, wouldn't it be useful for consumer application to have lag
information available with simple API call?

Thanks,
M


Re: Incorrect consumer offsets after broker restart 0.11.0.0

2017-10-12 Thread Elyahou Ittah
Yes, this is the Jira ticket about this issue:

https://issues.apache.org/jira/browse/KAFKA-5600

On Wed, Oct 11, 2017 at 5:47 PM, Vincent Dautremont <
vincent.dautrem...@olamobile.com.invalid> wrote:

> I would also like to know the related Jira ticket if any, to check that
> what I experience the same phenomenon.
> I see this happening even without restarting the kafka broker process :
>
> I sometime have a Zookeeper socket that fails, the Kafka broker then step
> down from its leader duties for a few seconds before entering the cluster,
> Consumers gets a response from the broker that it is not anymore part of
> the cluster.
>
> So my client resets, gets reassigned partitions and gets some old CG
> offsets from day ago (often from log data that was already deleted).
>
> Thanks.
>
>
> On Wed, Oct 11, 2017 at 1:59 PM, Phil Luckhurst <
> phil.luckhu...@encycle.com>
> wrote:
>
> > Upgrading the broker to version 0.11.0.1 has fixed the problem.
> >
> > Thanks.
> >
>
> --
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>


Re: [VOTE] 1.0.0 RC0

2017-10-12 Thread Ismael Juma
See inline.

On Thu, Oct 12, 2017 at 6:43 AM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:
>
>
> [2017-10-11 21:45:11,642] FATAL  (kafka.Kafka$)
> java.lang.IllegalArgumentException: Unknown signal: HUP
> at sun.misc.Signal.(Unknown Source)
> at kafka.Kafka$.registerHandler$1(Kafka.scala:67)
> at kafka.Kafka$.registerLoggingSignalHandler(Kafka.scala:73)
> at kafka.Kafka$.main(Kafka.scala:82)
> at kafka.Kafka.main(Kafka.scala)
>

https://github.com/apache/kafka/pull/4066

Ismael