Re: Hitting integer limit when setting log segment.bytes

2015-05-14 Thread Lance Laursen
Great, thanks for the link Mike.

>From what I can tell, the only time opening of a segment file would be slow
is in the event of unclean shutdown, where a segment file may not have been
fsync'd and Kafka needs to CRC it and rebuild its index. This should really
only be a problem for the "newest" log segment and only with a limited
subset of topics which would be set to a high segment size.

However, having to necessitate a larger pointer (and accompanied increased
index size) just to accommodate large topics may be a good enough reason to
neglect large segments and embrace opening 20,000 files per restart, as it
seems the common use case for kafka involves a many-small-messages workload.

Thanks for the responses

On Wed, May 13, 2015 at 11:52 PM, Mike Axiak  wrote:

> Jay Kreps has commented on this before:
>
> https://issues.apache.org/jira/browse/KAFKA-1670?focusedCommentId=14161185&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14161185
>
> Basically, you can always have more segment files. Having too large of
> segment files will significantly slow down the opening of files which is
> done whenever a broker comes online or has to recover.
>
> On Thu, May 14, 2015 at 3:10 AM, Mayuresh Gharat <
> gharatmayures...@gmail.com
> > wrote:
>
> > I suppose it is way log management works in kafka.
> > I am not sure the exact reason for this. Also the index files that are
> > constructed have a mapping of relative offset to the base offset of log
> > file to the real offset. The key value in index file is of the form
> > .
> >
> >
> > Thanks,
> >
> > Mayuresh
> >
> >
> > On Wed, May 13, 2015 at 5:57 PM, Lance Laursen <
> > llaur...@rubiconproject.com>
> > wrote:
> >
> > > Hey folks,
> > >
> > > Any update on this?
> > >
> > > On Thu, Apr 30, 2015 at 5:34 PM, Lance Laursen <
> > > llaur...@rubiconproject.com>
> > > wrote:
> > >
> > > > Hey all,
> > > >
> > > > I am attempting to create a topic which uses 8GB log segment sizes,
> > like
> > > > so:
> > > > ./kafka-topics.sh --zookeeper localhost:2181 --create --topic
> > > perftest6p2r
> > > > --partitions 6 --replication-factor 2 --config
> max.message.bytes=655360
> > > > --config segment.bytes=8589934592
> > > >
> > > > And am getting the following error:
> > > > Error while executing topic command For input string: "8589934592"
> > > > java.lang.NumberFormatException: For input string: "8589934592"
> > > > at
> > > >
> > >
> >
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> > > > at java.lang.Integer.parseInt(Integer.java:583)
> > > > ...
> > > > ...
> > > >
> > > > Upon further testing with --alter topic, it would appear that
> > > > segment.bytes will not accept a value higher than 2,147,483,647,
> which
> > is
> > > > the upper limit for a signed 32bit int. This then restricts log
> segment
> > > > size to an upper limit of ~2GB.
> > > >
> > > > We run Kafka on hard drive dense machines, each with 10gbit uplinks.
> We
> > > > can set ulimits higher in order to deal with all the open file
> handles
> > > > (since Kafka keeps all log segment file handles open), but it would
> be
> > > > preferable to minimize this number, as well as minimize the amount of
> > log
> > > > segment rollover experienced at high traffic (ie: a rollover every
> 1-2
> > > > seconds or so when saturating 10gbe).
> > > >
> > > > Is there a reason (performance or otherwise) that a 32 bit integer is
> > > used
> > > > rather than something larger?
> > > >
> > > > Thanks,
> > > > -Lance
> > > >
> > > >
> > >
> > >
> > > --
> > >
> > > [image: elogo.png]
> > >
> > > Leading the Automation of Advertising
> > >
> > > LANCE LAURSEN | Systems Architect
> > >
> > > ••• (M) 310.903.0546
> > >
> > > 12181 BLUFF CREEK DRIVE, 4TH FLOOR, PLAYA VISTA, CA 90094
> > >
> > > RUBICONPROJECT.COM  | @RUBICONPROJECT
> > > 
> > >
> >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>



--


Re: Hitting integer limit when setting log segment.bytes

2015-05-13 Thread Mike Axiak
Jay Kreps has commented on this before:
https://issues.apache.org/jira/browse/KAFKA-1670?focusedCommentId=14161185&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14161185

Basically, you can always have more segment files. Having too large of
segment files will significantly slow down the opening of files which is
done whenever a broker comes online or has to recover.

On Thu, May 14, 2015 at 3:10 AM, Mayuresh Gharat  wrote:

> I suppose it is way log management works in kafka.
> I am not sure the exact reason for this. Also the index files that are
> constructed have a mapping of relative offset to the base offset of log
> file to the real offset. The key value in index file is of the form
> .
>
>
> Thanks,
>
> Mayuresh
>
>
> On Wed, May 13, 2015 at 5:57 PM, Lance Laursen <
> llaur...@rubiconproject.com>
> wrote:
>
> > Hey folks,
> >
> > Any update on this?
> >
> > On Thu, Apr 30, 2015 at 5:34 PM, Lance Laursen <
> > llaur...@rubiconproject.com>
> > wrote:
> >
> > > Hey all,
> > >
> > > I am attempting to create a topic which uses 8GB log segment sizes,
> like
> > > so:
> > > ./kafka-topics.sh --zookeeper localhost:2181 --create --topic
> > perftest6p2r
> > > --partitions 6 --replication-factor 2 --config max.message.bytes=655360
> > > --config segment.bytes=8589934592
> > >
> > > And am getting the following error:
> > > Error while executing topic command For input string: "8589934592"
> > > java.lang.NumberFormatException: For input string: "8589934592"
> > > at
> > >
> >
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> > > at java.lang.Integer.parseInt(Integer.java:583)
> > > ...
> > > ...
> > >
> > > Upon further testing with --alter topic, it would appear that
> > > segment.bytes will not accept a value higher than 2,147,483,647, which
> is
> > > the upper limit for a signed 32bit int. This then restricts log segment
> > > size to an upper limit of ~2GB.
> > >
> > > We run Kafka on hard drive dense machines, each with 10gbit uplinks. We
> > > can set ulimits higher in order to deal with all the open file handles
> > > (since Kafka keeps all log segment file handles open), but it would be
> > > preferable to minimize this number, as well as minimize the amount of
> log
> > > segment rollover experienced at high traffic (ie: a rollover every 1-2
> > > seconds or so when saturating 10gbe).
> > >
> > > Is there a reason (performance or otherwise) that a 32 bit integer is
> > used
> > > rather than something larger?
> > >
> > > Thanks,
> > > -Lance
> > >
> > >
> >
> >
> > --
> >
> > [image: elogo.png]
> >
> > Leading the Automation of Advertising
> >
> > LANCE LAURSEN | Systems Architect
> >
> > ••• (M) 310.903.0546
> >
> > 12181 BLUFF CREEK DRIVE, 4TH FLOOR, PLAYA VISTA, CA 90094
> >
> > RUBICONPROJECT.COM  | @RUBICONPROJECT
> > 
> >
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>


Re: Hitting integer limit when setting log segment.bytes

2015-05-13 Thread Mayuresh Gharat
I suppose it is way log management works in kafka.
I am not sure the exact reason for this. Also the index files that are
constructed have a mapping of relative offset to the base offset of log
file to the real offset. The key value in index file is of the form
.


Thanks,

Mayuresh


On Wed, May 13, 2015 at 5:57 PM, Lance Laursen 
wrote:

> Hey folks,
>
> Any update on this?
>
> On Thu, Apr 30, 2015 at 5:34 PM, Lance Laursen <
> llaur...@rubiconproject.com>
> wrote:
>
> > Hey all,
> >
> > I am attempting to create a topic which uses 8GB log segment sizes, like
> > so:
> > ./kafka-topics.sh --zookeeper localhost:2181 --create --topic
> perftest6p2r
> > --partitions 6 --replication-factor 2 --config max.message.bytes=655360
> > --config segment.bytes=8589934592
> >
> > And am getting the following error:
> > Error while executing topic command For input string: "8589934592"
> > java.lang.NumberFormatException: For input string: "8589934592"
> > at
> >
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> > at java.lang.Integer.parseInt(Integer.java:583)
> > ...
> > ...
> >
> > Upon further testing with --alter topic, it would appear that
> > segment.bytes will not accept a value higher than 2,147,483,647, which is
> > the upper limit for a signed 32bit int. This then restricts log segment
> > size to an upper limit of ~2GB.
> >
> > We run Kafka on hard drive dense machines, each with 10gbit uplinks. We
> > can set ulimits higher in order to deal with all the open file handles
> > (since Kafka keeps all log segment file handles open), but it would be
> > preferable to minimize this number, as well as minimize the amount of log
> > segment rollover experienced at high traffic (ie: a rollover every 1-2
> > seconds or so when saturating 10gbe).
> >
> > Is there a reason (performance or otherwise) that a 32 bit integer is
> used
> > rather than something larger?
> >
> > Thanks,
> > -Lance
> >
> >
>
>
> --
>
> [image: elogo.png]
>
> Leading the Automation of Advertising
>
> LANCE LAURSEN | Systems Architect
>
> ••• (M) 310.903.0546
>
> 12181 BLUFF CREEK DRIVE, 4TH FLOOR, PLAYA VISTA, CA 90094
>
> RUBICONPROJECT.COM  | @RUBICONPROJECT
> 
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Re: Hitting integer limit when setting log segment.bytes

2015-05-13 Thread Lance Laursen
Hey folks,

Any update on this?

On Thu, Apr 30, 2015 at 5:34 PM, Lance Laursen 
wrote:

> Hey all,
>
> I am attempting to create a topic which uses 8GB log segment sizes, like
> so:
> ./kafka-topics.sh --zookeeper localhost:2181 --create --topic perftest6p2r
> --partitions 6 --replication-factor 2 --config max.message.bytes=655360
> --config segment.bytes=8589934592
>
> And am getting the following error:
> Error while executing topic command For input string: "8589934592"
> java.lang.NumberFormatException: For input string: "8589934592"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:583)
> ...
> ...
>
> Upon further testing with --alter topic, it would appear that
> segment.bytes will not accept a value higher than 2,147,483,647, which is
> the upper limit for a signed 32bit int. This then restricts log segment
> size to an upper limit of ~2GB.
>
> We run Kafka on hard drive dense machines, each with 10gbit uplinks. We
> can set ulimits higher in order to deal with all the open file handles
> (since Kafka keeps all log segment file handles open), but it would be
> preferable to minimize this number, as well as minimize the amount of log
> segment rollover experienced at high traffic (ie: a rollover every 1-2
> seconds or so when saturating 10gbe).
>
> Is there a reason (performance or otherwise) that a 32 bit integer is used
> rather than something larger?
>
> Thanks,
> -Lance
>
>


-- 

[image: elogo.png]

Leading the Automation of Advertising

LANCE LAURSEN | Systems Architect

••• (M) 310.903.0546

12181 BLUFF CREEK DRIVE, 4TH FLOOR, PLAYA VISTA, CA 90094

RUBICONPROJECT.COM  | @RUBICONPROJECT