Re: Any tool to easily fetch a single message or a few messages starting from a given offset then exit after fetching said count of messages?

2015-07-24 Thread Magnus Edenhill
Hi David,

kafkacat does this with ease:
  kafkacat -b mybroker -t mytopic -p mypartition -o myoffset -c msgcount

If you are to pass the output to another program/script you may want to
look in to
the -f formatting option or -J for JSON.

See here:
https://github.com/edenhill/kafkacat


Regards,
Magnus


2015-07-25 2:20 GMT+02:00 David Luu :

> Hi,
>
> I notice the kafka-console-consumer.sh script has option to fetch a max #
> of messages, which can be 1 or more, and then exit. Which is nice. But as a
> high level consumer, it's missing option to fetch from given offset other
> than earliest & latest offsets.
>
> Is there any off the shelf tool (CLI, simple script, etc.) that I can use
> to fetch 1 or more messages from a given offset of some topic as a
> consumer, then exit? Or do I have to write my own tool to do this from any
> of the kafka libraries in various languages? If the later, is there any
> where I can whip one up in fewest lines possible? I don't want to have to
> read up and spend time crafting up a tool for this if I don't have to.
>
> Wish this feature was already built in to the kafka toolchain.
>
> Regards,
> David
>


Re: deleting data automatically

2015-07-24 Thread gharatmayuresh15
You can configure that in the Configs by setting log retention :

http://kafka.apache.org/07/configuration.html

Thanks,

Mayuresh

Sent from my iPhone

> On Jul 24, 2015, at 12:49 PM, Yuheng Du  wrote:
> 
> Hi,
> 
> I am testing the kafka producer performance. So I created a queue and
> writes a large amount of data to that queue.
> 
> Is there a way to delete the data automatically after some time, say
> whenever the data size reaches 50GB or the retention time exceeds 10
> seconds, it will be deleted so my disk won't get filled and new data can't
> be written in?
> 
> Thanks.!


Re: deleting data automatically

2015-07-24 Thread Ewen Cheslack-Postava
You'll want to set the log retention policy via
log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really
aggressive collection (e.g., on the order of seconds, as you specified),
you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and
log.retention.check.interval.ms.

On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du 
wrote:

> Hi,
>
> I am testing the kafka producer performance. So I created a queue and
> writes a large amount of data to that queue.
>
> Is there a way to delete the data automatically after some time, say
> whenever the data size reaches 50GB or the retention time exceeds 10
> seconds, it will be deleted so my disk won't get filled and new data can't
> be written in?
>
> Thanks.!
>



-- 
Thanks,
Ewen


Re: Log Deletion Behavior

2015-07-24 Thread Grant Henke
Also this stackoverflow answer may help: http://stackoverflow.com/a/29672325

On Fri, Jul 24, 2015 at 9:36 PM, Grant Henke  wrote:

> I would actually suggest only using the ms versions of the retention
> config. Be sure to check/set all the configs below and look at the
> documented explanations here
> http://kafka.apache.org/documentation.html#brokerconfigs.
>
> I am guessing the default log.retention.check.interval.ms of 5 minutes is
> too long for your case or you may have changed from the default
> log.cleanup.policy and it is no longer set to delete.
>
> log.retention.ms
> log.retention.check.interval.ms
> log.cleanup.policy
>
> On Fri, Jul 24, 2015 at 9:14 PM, Mayuresh Gharat <
> gharatmayures...@gmail.com> wrote:
>
>> To add on, the main thing here is you should be using only one of these
>> properties.
>>
>> Thanks,
>>
>> Mayuresh
>>
>> On Fri, Jul 24, 2015 at 6:47 PM, Mayuresh Gharat <
>> gharatmayures...@gmail.com
>> > wrote:
>>
>> > Yes. It should. Do not set other retention settings. Just use the
>> "hours"
>> > settings.
>> > Let me know about this :)
>> >
>> > Thanks,
>> >
>> > Mayuresh
>> >
>> > On Fri, Jul 24, 2015 at 6:43 PM, JIEFU GONG  wrote:
>> >
>> >> Mayuresh, thanks for your comment. I won't be able to change these
>> >> settings
>> >> until next Monday, but just so confirm you are saying that if I restart
>> >> the
>> >> brokers my logs should delete themselves with respect to the newest
>> >> settings, correct?
>> >> ᐧ
>> >>
>> >> On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat <
>> >> gharatmayures...@gmail.com
>> >> > wrote:
>> >>
>> >> > No. This should not happen. At Linkedin we just use the log retention
>> >> > hours. Try using that. Chang e it and bounce the broker. It should
>> work.
>> >> > Also looking back at the config's I am not sure why we had 3
>> different
>> >> > configs for the same property :
>> >> >
>> >> > "log.retention.ms"
>> >> > "log.retention.minutes"
>> >> > "log.retention.hours"
>> >> >
>> >> > We should probably be having just the milliseconds.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Mayuresh
>> >> >
>> >> > On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG 
>> wrote:
>> >> >
>> >> > > Hi all,
>> >> > >
>> >> > > I have a few broad questions on how log deletion works,
>> specifically
>> >> in
>> >> > > conjunction with the log.retention.time setting. Say I published
>> some
>> >> > > messages to some topics when the configuration was originally set
>> to
>> >> > > something like log.retention.hours=168 (default). If I publish
>> these
>> >> > > messages successfully, then later set the configuration to
>> something
>> >> like
>> >> > > log.retention.minutes=1, are those logs supposed to persist for the
>> >> > newest
>> >> > > settings or the old settings? Right now my logs are refusing to
>> delete
>> >> > > themselves unless I specifically mark them for deletion -- is this
>> the
>> >> > > correct/anticipated/wanted behavior?
>> >> > >
>> >> > > Thanks for the help!
>> >> > >
>> >> > > --
>> >> > >
>> >> > > Jiefu Gong
>> >> > > University of California, Berkeley | Class of 2017
>> >> > > B.A Computer Science | College of Letters and Sciences
>> >> > >
>> >> > > jg...@berkeley.edu  | (925) 400-3427
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > -Regards,
>> >> > Mayuresh R. Gharat
>> >> > (862) 250-7125
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Jiefu Gong
>> >> University of California, Berkeley | Class of 2017
>> >> B.A Computer Science | College of Letters and Sciences
>> >>
>> >> jg...@berkeley.edu  | (925) 400-3427
>> >>
>> >
>> >
>> >
>> > --
>> > -Regards,
>> > Mayuresh R. Gharat
>> > (862) 250-7125
>> >
>>
>>
>>
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>>
>
>
>
> --
> Grant Henke
> Solutions Consultant | Cloudera
> ghe...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>



-- 
Grant Henke
Solutions Consultant | Cloudera
ghe...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: Log Deletion Behavior

2015-07-24 Thread Grant Henke
I would actually suggest only using the ms versions of the retention
config. Be sure to check/set all the configs below and look at the
documented explanations here
http://kafka.apache.org/documentation.html#brokerconfigs.

I am guessing the default log.retention.check.interval.ms of 5 minutes is
too long for your case or you may have changed from the default
log.cleanup.policy and it is no longer set to delete.

log.retention.ms
log.retention.check.interval.ms
log.cleanup.policy

On Fri, Jul 24, 2015 at 9:14 PM, Mayuresh Gharat  wrote:

> To add on, the main thing here is you should be using only one of these
> properties.
>
> Thanks,
>
> Mayuresh
>
> On Fri, Jul 24, 2015 at 6:47 PM, Mayuresh Gharat <
> gharatmayures...@gmail.com
> > wrote:
>
> > Yes. It should. Do not set other retention settings. Just use the "hours"
> > settings.
> > Let me know about this :)
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Fri, Jul 24, 2015 at 6:43 PM, JIEFU GONG  wrote:
> >
> >> Mayuresh, thanks for your comment. I won't be able to change these
> >> settings
> >> until next Monday, but just so confirm you are saying that if I restart
> >> the
> >> brokers my logs should delete themselves with respect to the newest
> >> settings, correct?
> >> ᐧ
> >>
> >> On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat <
> >> gharatmayures...@gmail.com
> >> > wrote:
> >>
> >> > No. This should not happen. At Linkedin we just use the log retention
> >> > hours. Try using that. Chang e it and bounce the broker. It should
> work.
> >> > Also looking back at the config's I am not sure why we had 3 different
> >> > configs for the same property :
> >> >
> >> > "log.retention.ms"
> >> > "log.retention.minutes"
> >> > "log.retention.hours"
> >> >
> >> > We should probably be having just the milliseconds.
> >> >
> >> > Thanks,
> >> >
> >> > Mayuresh
> >> >
> >> > On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG 
> wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > I have a few broad questions on how log deletion works, specifically
> >> in
> >> > > conjunction with the log.retention.time setting. Say I published
> some
> >> > > messages to some topics when the configuration was originally set to
> >> > > something like log.retention.hours=168 (default). If I publish these
> >> > > messages successfully, then later set the configuration to something
> >> like
> >> > > log.retention.minutes=1, are those logs supposed to persist for the
> >> > newest
> >> > > settings or the old settings? Right now my logs are refusing to
> delete
> >> > > themselves unless I specifically mark them for deletion -- is this
> the
> >> > > correct/anticipated/wanted behavior?
> >> > >
> >> > > Thanks for the help!
> >> > >
> >> > > --
> >> > >
> >> > > Jiefu Gong
> >> > > University of California, Berkeley | Class of 2017
> >> > > B.A Computer Science | College of Letters and Sciences
> >> > >
> >> > > jg...@berkeley.edu  | (925) 400-3427
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > -Regards,
> >> > Mayuresh R. Gharat
> >> > (862) 250-7125
> >> >
> >>
> >>
> >>
> >> --
> >>
> >> Jiefu Gong
> >> University of California, Berkeley | Class of 2017
> >> B.A Computer Science | College of Letters and Sciences
> >>
> >> jg...@berkeley.edu  | (925) 400-3427
> >>
> >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>



-- 
Grant Henke
Solutions Consultant | Cloudera
ghe...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: Log Deletion Behavior

2015-07-24 Thread Mayuresh Gharat
To add on, the main thing here is you should be using only one of these
properties.

Thanks,

Mayuresh

On Fri, Jul 24, 2015 at 6:47 PM, Mayuresh Gharat  wrote:

> Yes. It should. Do not set other retention settings. Just use the "hours"
> settings.
> Let me know about this :)
>
> Thanks,
>
> Mayuresh
>
> On Fri, Jul 24, 2015 at 6:43 PM, JIEFU GONG  wrote:
>
>> Mayuresh, thanks for your comment. I won't be able to change these
>> settings
>> until next Monday, but just so confirm you are saying that if I restart
>> the
>> brokers my logs should delete themselves with respect to the newest
>> settings, correct?
>> ᐧ
>>
>> On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat <
>> gharatmayures...@gmail.com
>> > wrote:
>>
>> > No. This should not happen. At Linkedin we just use the log retention
>> > hours. Try using that. Chang e it and bounce the broker. It should work.
>> > Also looking back at the config's I am not sure why we had 3 different
>> > configs for the same property :
>> >
>> > "log.retention.ms"
>> > "log.retention.minutes"
>> > "log.retention.hours"
>> >
>> > We should probably be having just the milliseconds.
>> >
>> > Thanks,
>> >
>> > Mayuresh
>> >
>> > On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG  wrote:
>> >
>> > > Hi all,
>> > >
>> > > I have a few broad questions on how log deletion works, specifically
>> in
>> > > conjunction with the log.retention.time setting. Say I published some
>> > > messages to some topics when the configuration was originally set to
>> > > something like log.retention.hours=168 (default). If I publish these
>> > > messages successfully, then later set the configuration to something
>> like
>> > > log.retention.minutes=1, are those logs supposed to persist for the
>> > newest
>> > > settings or the old settings? Right now my logs are refusing to delete
>> > > themselves unless I specifically mark them for deletion -- is this the
>> > > correct/anticipated/wanted behavior?
>> > >
>> > > Thanks for the help!
>> > >
>> > > --
>> > >
>> > > Jiefu Gong
>> > > University of California, Berkeley | Class of 2017
>> > > B.A Computer Science | College of Letters and Sciences
>> > >
>> > > jg...@berkeley.edu  | (925) 400-3427
>> > >
>> >
>> >
>> >
>> > --
>> > -Regards,
>> > Mayuresh R. Gharat
>> > (862) 250-7125
>> >
>>
>>
>>
>> --
>>
>> Jiefu Gong
>> University of California, Berkeley | Class of 2017
>> B.A Computer Science | College of Letters and Sciences
>>
>> jg...@berkeley.edu  | (925) 400-3427
>>
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Re: Log Deletion Behavior

2015-07-24 Thread JIEFU GONG
Okay, I will look into only using the hours setting, but I think that means
the minimum a log can be stored is 1 hour right? I think last time I tried
there Kafka failed to parse decimals.
ᐧ

On Fri, Jul 24, 2015 at 6:47 PM, Mayuresh Gharat  wrote:

> Yes. It should. Do not set other retention settings. Just use the "hours"
> settings.
> Let me know about this :)
>
> Thanks,
>
> Mayuresh
>
> On Fri, Jul 24, 2015 at 6:43 PM, JIEFU GONG  wrote:
>
> > Mayuresh, thanks for your comment. I won't be able to change these
> settings
> > until next Monday, but just so confirm you are saying that if I restart
> the
> > brokers my logs should delete themselves with respect to the newest
> > settings, correct?
> > ᐧ
> >
> > On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat <
> > gharatmayures...@gmail.com
> > > wrote:
> >
> > > No. This should not happen. At Linkedin we just use the log retention
> > > hours. Try using that. Chang e it and bounce the broker. It should
> work.
> > > Also looking back at the config's I am not sure why we had 3 different
> > > configs for the same property :
> > >
> > > "log.retention.ms"
> > > "log.retention.minutes"
> > > "log.retention.hours"
> > >
> > > We should probably be having just the milliseconds.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > > On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have a few broad questions on how log deletion works, specifically
> in
> > > > conjunction with the log.retention.time setting. Say I published some
> > > > messages to some topics when the configuration was originally set to
> > > > something like log.retention.hours=168 (default). If I publish these
> > > > messages successfully, then later set the configuration to something
> > like
> > > > log.retention.minutes=1, are those logs supposed to persist for the
> > > newest
> > > > settings or the old settings? Right now my logs are refusing to
> delete
> > > > themselves unless I specifically mark them for deletion -- is this
> the
> > > > correct/anticipated/wanted behavior?
> > > >
> > > > Thanks for the help!
> > > >
> > > > --
> > > >
> > > > Jiefu Gong
> > > > University of California, Berkeley | Class of 2017
> > > > B.A Computer Science | College of Letters and Sciences
> > > >
> > > > jg...@berkeley.edu  | (925) 400-3427
> > > >
> > >
> > >
> > >
> > > --
> > > -Regards,
> > > Mayuresh R. Gharat
> > > (862) 250-7125
> > >
> >
> >
> >
> > --
> >
> > Jiefu Gong
> > University of California, Berkeley | Class of 2017
> > B.A Computer Science | College of Letters and Sciences
> >
> > jg...@berkeley.edu  | (925) 400-3427
> >
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>



-- 

Jiefu Gong
University of California, Berkeley | Class of 2017
B.A Computer Science | College of Letters and Sciences

jg...@berkeley.edu  | (925) 400-3427


error while high level consumer

2015-07-24 Thread Kris K
Hi,

I started seeing these errors in the logs continuously when I try to bring
the High Level Consumer up. Please help.


ZookeeperConsumerConnector [INFO] [XXX], waiting for the partition
ownership to be deleted: 1

ZookeeperConsumerConnector [INFO] [XXX], end rebalancing
consumer XXX try #0

ZookeeperConsumerConnector [INFO] [XXX], Rebalancing attempt failed.
Clearing the cache before the next rebalancing operation is triggered


Thanks,

Kris


Re: Log Deletion Behavior

2015-07-24 Thread Mayuresh Gharat
Yes. It should. Do not set other retention settings. Just use the "hours"
settings.
Let me know about this :)

Thanks,

Mayuresh

On Fri, Jul 24, 2015 at 6:43 PM, JIEFU GONG  wrote:

> Mayuresh, thanks for your comment. I won't be able to change these settings
> until next Monday, but just so confirm you are saying that if I restart the
> brokers my logs should delete themselves with respect to the newest
> settings, correct?
> ᐧ
>
> On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat <
> gharatmayures...@gmail.com
> > wrote:
>
> > No. This should not happen. At Linkedin we just use the log retention
> > hours. Try using that. Chang e it and bounce the broker. It should work.
> > Also looking back at the config's I am not sure why we had 3 different
> > configs for the same property :
> >
> > "log.retention.ms"
> > "log.retention.minutes"
> > "log.retention.hours"
> >
> > We should probably be having just the milliseconds.
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG  wrote:
> >
> > > Hi all,
> > >
> > > I have a few broad questions on how log deletion works, specifically in
> > > conjunction with the log.retention.time setting. Say I published some
> > > messages to some topics when the configuration was originally set to
> > > something like log.retention.hours=168 (default). If I publish these
> > > messages successfully, then later set the configuration to something
> like
> > > log.retention.minutes=1, are those logs supposed to persist for the
> > newest
> > > settings or the old settings? Right now my logs are refusing to delete
> > > themselves unless I specifically mark them for deletion -- is this the
> > > correct/anticipated/wanted behavior?
> > >
> > > Thanks for the help!
> > >
> > > --
> > >
> > > Jiefu Gong
> > > University of California, Berkeley | Class of 2017
> > > B.A Computer Science | College of Letters and Sciences
> > >
> > > jg...@berkeley.edu  | (925) 400-3427
> > >
> >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>
>
>
> --
>
> Jiefu Gong
> University of California, Berkeley | Class of 2017
> B.A Computer Science | College of Letters and Sciences
>
> jg...@berkeley.edu  | (925) 400-3427
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Re: Log Deletion Behavior

2015-07-24 Thread JIEFU GONG
Mayuresh, thanks for your comment. I won't be able to change these settings
until next Monday, but just so confirm you are saying that if I restart the
brokers my logs should delete themselves with respect to the newest
settings, correct?
ᐧ

On Fri, Jul 24, 2015 at 6:29 PM, Mayuresh Gharat  wrote:

> No. This should not happen. At Linkedin we just use the log retention
> hours. Try using that. Chang e it and bounce the broker. It should work.
> Also looking back at the config's I am not sure why we had 3 different
> configs for the same property :
>
> "log.retention.ms"
> "log.retention.minutes"
> "log.retention.hours"
>
> We should probably be having just the milliseconds.
>
> Thanks,
>
> Mayuresh
>
> On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG  wrote:
>
> > Hi all,
> >
> > I have a few broad questions on how log deletion works, specifically in
> > conjunction with the log.retention.time setting. Say I published some
> > messages to some topics when the configuration was originally set to
> > something like log.retention.hours=168 (default). If I publish these
> > messages successfully, then later set the configuration to something like
> > log.retention.minutes=1, are those logs supposed to persist for the
> newest
> > settings or the old settings? Right now my logs are refusing to delete
> > themselves unless I specifically mark them for deletion -- is this the
> > correct/anticipated/wanted behavior?
> >
> > Thanks for the help!
> >
> > --
> >
> > Jiefu Gong
> > University of California, Berkeley | Class of 2017
> > B.A Computer Science | College of Letters and Sciences
> >
> > jg...@berkeley.edu  | (925) 400-3427
> >
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>



-- 

Jiefu Gong
University of California, Berkeley | Class of 2017
B.A Computer Science | College of Letters and Sciences

jg...@berkeley.edu  | (925) 400-3427


Re: Log Deletion Behavior

2015-07-24 Thread Mayuresh Gharat
No. This should not happen. At Linkedin we just use the log retention
hours. Try using that. Chang e it and bounce the broker. It should work.
Also looking back at the config's I am not sure why we had 3 different
configs for the same property :

"log.retention.ms"
"log.retention.minutes"
"log.retention.hours"

We should probably be having just the milliseconds.

Thanks,

Mayuresh

On Fri, Jul 24, 2015 at 4:12 PM, JIEFU GONG  wrote:

> Hi all,
>
> I have a few broad questions on how log deletion works, specifically in
> conjunction with the log.retention.time setting. Say I published some
> messages to some topics when the configuration was originally set to
> something like log.retention.hours=168 (default). If I publish these
> messages successfully, then later set the configuration to something like
> log.retention.minutes=1, are those logs supposed to persist for the newest
> settings or the old settings? Right now my logs are refusing to delete
> themselves unless I specifically mark them for deletion -- is this the
> correct/anticipated/wanted behavior?
>
> Thanks for the help!
>
> --
>
> Jiefu Gong
> University of California, Berkeley | Class of 2017
> B.A Computer Science | College of Letters and Sciences
>
> jg...@berkeley.edu  | (925) 400-3427
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Changing error codes from the simple consumer?

2015-07-24 Thread Coolbeth, Matthew
I am having a strange interaction with the simple consumer API in Kafka 0.8.2.

Running the following code:


public FetchedMessageList send(String topic, int partition, long offset, int 
fetchSize) throws KafkaException {
try {
FetchedMessageList results = new FetchedMessageList();
FetchRequest req = new FetchRequestBuilder()
.clientId(clientId)
.addFetch(topic, partition, offset, fetchSize)
.minBytes(1)
.maxWait(250)
.build();
FetchResponse resp = consumer.fetch(req);
if(resp.hasError()) {
int code =  resp.errorCode(topic, partition);
if (code == 9) {
logger.warn("Fetch response contained error code: {}", 
resp.errorCode(topic, partition));
} else {
throw new RuntimeException(String.format("Fetch response 
contained error code %d", code));
}
}

Following a leader election, the FetchResponse has an error, and trips the 
(code == 9) condition, causing the program to emit a message via logger.warn.
However, the emitted log message reads “Fetch response contained error code: 6”.
I looked over the Kafka source code and don’t see how resp.errorCode might 
return a different value the second time.
Has anyone seen this before?

Thanks,

Matt Coolbeth


Any tool to easily fetch a single message or a few messages starting from a given offset then exit after fetching said count of messages?

2015-07-24 Thread David Luu
Hi,

I notice the kafka-console-consumer.sh script has option to fetch a max #
of messages, which can be 1 or more, and then exit. Which is nice. But as a
high level consumer, it's missing option to fetch from given offset other
than earliest & latest offsets.

Is there any off the shelf tool (CLI, simple script, etc.) that I can use
to fetch 1 or more messages from a given offset of some topic as a
consumer, then exit? Or do I have to write my own tool to do this from any
of the kafka libraries in various languages? If the later, is there any
where I can whip one up in fewest lines possible? I don't want to have to
read up and spend time crafting up a tool for this if I don't have to.

Wish this feature was already built in to the kafka toolchain.

Regards,
David


Log Deletion Behavior

2015-07-24 Thread JIEFU GONG
Hi all,

I have a few broad questions on how log deletion works, specifically in
conjunction with the log.retention.time setting. Say I published some
messages to some topics when the configuration was originally set to
something like log.retention.hours=168 (default). If I publish these
messages successfully, then later set the configuration to something like
log.retention.minutes=1, are those logs supposed to persist for the newest
settings or the old settings? Right now my logs are refusing to delete
themselves unless I specifically mark them for deletion -- is this the
correct/anticipated/wanted behavior?

Thanks for the help!

-- 

Jiefu Gong
University of California, Berkeley | Class of 2017
B.A Computer Science | College of Letters and Sciences

jg...@berkeley.edu  | (925) 400-3427


deleting data automatically

2015-07-24 Thread Yuheng Du
Hi,

I am testing the kafka producer performance. So I created a queue and
writes a large amount of data to that queue.

Is there a way to delete the data automatically after some time, say
whenever the data size reaches 50GB or the retention time exceeds 10
seconds, it will be deleted so my disk won't get filled and new data can't
be written in?

Thanks.!


New consumer - offset one gets in poll is not offset one is supposed to commit

2015-07-24 Thread Stevo Slavić
Hello Apache Kafka community,

Say there is only one topic with single partition and a single message on
it.
Result of calling a poll with new consumer will return ConsumerRecord for
that message and it will have offset of 0.

After processing message, current KafkaConsumer implementation expects one
to commit not offset 0 as processed, but to commit offset 1 - next
offset/position one would like to consume.

Does this sound strange to you as well?

Wondering couldn't this offset+1 handling for next position to read been
done in one place, in KafkaConsumer implementation or broker or whatever,
instead of every user of KafkaConsumer having to do it.

Kind regards,
Stevo Slavic.


Re: properducertest on multiple nodes

2015-07-24 Thread Yuheng Du
I deleted the queue and recreated it before I run the test. Things are
working after restart the broker cluster, thanks!

On Fri, Jul 24, 2015 at 12:06 PM, Gwen Shapira 
wrote:

> Does topic "speedx1" exist?
>
> On Fri, Jul 24, 2015 at 7:09 AM, Yuheng Du 
> wrote:
> > Hi,
> >
> > I am trying to run 20 performance test on 10 nodes using pbsdsh.
> >
> > The messages will send to a 6 brokers cluster. It seems to work for a
> > while. When I delete the test queue and rerun the test, the broker does
> not
> > seem to process incoming messages:
> >
> > [yuhengd@node1739 kafka_2.10-0.8.2.1]$ bin/kafka-run-class.sh
> > org.apache.kafka.clients.tools.ProducerPerformance speedx1 5000 100
> -1
> > acks=1 bootstrap.servers=130.127.133.72:9092 buffer.memory=67108864
> > batch.size=8196
> >
> > 1 records sent, 0.0 records/sec (0.00 MB/sec), 6.0 ms avg latency,
> > 6.0 max latency.
> >
> > org.apache.kafka.common.errors.TimeoutException: Failed to update
> metadata
> > after 79 ms.
> >
> >
> > Can anyone suggest what the problem is? Or what configuration should I
> > change? Thanks!
>


Re: New consumer - partitions auto assigned only on poll

2015-07-24 Thread Stevo Slavić
Hello Jason,

Thanks for feedback. I've created ticket for this
https://issues.apache.org/jira/browse/KAFKA-2359

Kind regards,
Stevo Slavic.

On Wed, Jul 22, 2015 at 6:18 PM, Jason Gustafson  wrote:

> Hey Stevo,
>
> That's a good point. I think the javadoc is pretty clear that this could
> return no partitions when the consumer has no active assignment, but it may
> be a little unintuitive to have to call poll() after subscribing before you
> can get the assigned partitions. I can't think of a strong reason not to go
> ahead with the assignment in subscriptions() other than to keep it
> non-blocking. Perhaps you can open a ticket and we can get feedback from
> some other devs?
>
> Thanks,
> Jason
>
> On Wed, Jul 22, 2015 at 2:09 AM, Stevo Slavić  wrote:
>
> > Hello Apache Kafka community,
> >
> > In the new consumer I encountered unexpected behavior. After constructing
> > KafakConsumer instance with configured consumer rebalance callback
> handler,
> > and subscribing to a topic with "consumer.subscribe(topic)", retrieving
> > subscriptions would return empty set and callback handler would not get
> > called (no partitions ever assigned or revoked), no matter how long
> > instance was up.
> >
> > Then I found by inspecting KafkaConsumer code that partition assignment
> > will only be triggered on first poll, pollOnce has:
> >
> > // ensure we have partitions assigned if we expect to
> > if (subscriptions.partitionsAutoAssigned())
> > coordinator.ensurePartitionAssignment();
> >
> > Would it make sense to include this fragment in
> KafkaConsumer.subscriptions
> > accessor as well?
> >
> > Kind regards,
> > Stevo Slavic.
> >
>


Re: properducertest on multiple nodes

2015-07-24 Thread Gwen Shapira
Does topic "speedx1" exist?

On Fri, Jul 24, 2015 at 7:09 AM, Yuheng Du  wrote:
> Hi,
>
> I am trying to run 20 performance test on 10 nodes using pbsdsh.
>
> The messages will send to a 6 brokers cluster. It seems to work for a
> while. When I delete the test queue and rerun the test, the broker does not
> seem to process incoming messages:
>
> [yuhengd@node1739 kafka_2.10-0.8.2.1]$ bin/kafka-run-class.sh
> org.apache.kafka.clients.tools.ProducerPerformance speedx1 5000 100 -1
> acks=1 bootstrap.servers=130.127.133.72:9092 buffer.memory=67108864
> batch.size=8196
>
> 1 records sent, 0.0 records/sec (0.00 MB/sec), 6.0 ms avg latency,
> 6.0 max latency.
>
> org.apache.kafka.common.errors.TimeoutException: Failed to update metadata
> after 79 ms.
>
>
> Can anyone suggest what the problem is? Or what configuration should I
> change? Thanks!


Re: Hdfs fSshell getmerge

2015-07-24 Thread Jan Filipiak

Sorry wrong mailing list

On 24.07.2015 16:44, Jan Filipiak wrote:

Hello hadoop users,

I have an idea about a small feature for the getmerge tool. I recently 
was in the need of using the new line option -nl because the files I 
needed to merge simply didn't had one.
I was merging all the files from one directory and unfortunately this 
directory also included empty files, which effectively led to multiple 
newlines append after some files.

I needed to remove them manually afterwards.

In this situation it is maybe good to have another argument that 
allows skipping empty files. I just wrote down 2 change one could try 
at the end. Do you guys consider this as a good improvement to the 
command line tools?


Thing one could try to implement this feature:

The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't 
return the number of bytes copied which would be convenient as one 
could skip append the new line when 0 bytes where copied

Or one would check the file size before.

Please let me know If you would consider this useful and is worth a 
feature ticket in Jira.


Thank you
Jan




Hdfs fSshell getmerge

2015-07-24 Thread Jan Filipiak

Hello hadoop users,

I have an idea about a small feature for the getmerge tool. I recently 
was in the need of using the new line option -nl because the files I 
needed to merge simply didn't had one.
I was merging all the files from one directory and unfortunately this 
directory also included empty files, which effectively led to multiple 
newlines append after some files.

I needed to remove them manually afterwards.

In this situation it is maybe good to have another argument that allows 
skipping empty files. I just wrote down 2 change one could try at the 
end. Do you guys consider this as a good improvement to the command line 
tools?


Thing one could try to implement this feature:

The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't 
return the number of bytes copied which would be convenient as one could 
skip append the new line when 0 bytes where copied

Or one would check the file size before.

Please let me know If you would consider this useful and is worth a 
feature ticket in Jira.


Thank you
Jan


properducertest on multiple nodes

2015-07-24 Thread Yuheng Du
Hi,

I am trying to run 20 performance test on 10 nodes using pbsdsh.

The messages will send to a 6 brokers cluster. It seems to work for a
while. When I delete the test queue and rerun the test, the broker does not
seem to process incoming messages:

[yuhengd@node1739 kafka_2.10-0.8.2.1]$ bin/kafka-run-class.sh
org.apache.kafka.clients.tools.ProducerPerformance speedx1 5000 100 -1
acks=1 bootstrap.servers=130.127.133.72:9092 buffer.memory=67108864
batch.size=8196

1 records sent, 0.0 records/sec (0.00 MB/sec), 6.0 ms avg latency,
6.0 max latency.

org.apache.kafka.common.errors.TimeoutException: Failed to update metadata
after 79 ms.


Can anyone suggest what the problem is? Or what configuration should I
change? Thanks!