[jira] [Commented] (KAFKA-5236) Regression in on-disk log size when using Snappy compression with 0.8.2 log message format

2017-06-02 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035417#comment-16035417
 ] 

Ismael Juma commented on KAFKA-5236:


Thanks for verifying [~nickt].

> Regression in on-disk log size when using Snappy compression with 0.8.2 log 
> message format
> --
>
> Key: KAFKA-5236
> URL: https://issues.apache.org/jira/browse/KAFKA-5236
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Nick Travers
>Assignee: Ismael Juma
>  Labels: regression
> Fix For: 0.11.0.0
>
>
> We recently upgraded our brokers in our production environments from 0.10.1.1 
> to 0.10.2.1 and we've noticed a sizable regression in the on-disk .log file 
> size. For some deployments the increase was as much as 50%.
> We run our brokers with the 0.8.2 log message format version. The majority of 
> our message volume comes from 0.10.x Java clients sending messages encoded 
> with the Snappy codec.
> Some initial testing only shows a regression between the two versions when 
> using Snappy compression with a log message format of 0.8.2.
> I also tested 0.10.x log message formats as well as Gzip compression. The log 
> sizes do not differ in this case, so the issue seems confined to 0.8.2 
> message format and Snappy compression.
> A git-bisect lead me to this commit, which modified the server-side 
> implementation of `Record`:
> https://github.com/apache/kafka/commit/67f1e5b91bf073151ff57d5d656693e385726697
> Here's the PR, which has more context:
> https://github.com/apache/kafka/pull/2140
> Here is a link to the test I used to re-producer this issue:
> https://github.com/nicktrav/kafka/commit/68e8db4fa525e173651ac740edb270b0d90b8818
> cc: [~hachikuji] [~junrao] [~ijuma] [~guozhang] (tagged on original PR)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5236) Regression in on-disk log size when using Snappy compression with 0.8.2 log message format

2017-06-02 Thread Nick Travers (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035404#comment-16035404
 ] 

Nick Travers commented on KAFKA-5236:
-

Thanks for the patch [~ijuma]! I ran the same tests I used originally and 
confirmed that there was no regression in the on-disk size.

> Regression in on-disk log size when using Snappy compression with 0.8.2 log 
> message format
> --
>
> Key: KAFKA-5236
> URL: https://issues.apache.org/jira/browse/KAFKA-5236
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Nick Travers
>Assignee: Ismael Juma
>  Labels: regression
> Fix For: 0.11.0.0
>
>
> We recently upgraded our brokers in our production environments from 0.10.1.1 
> to 0.10.2.1 and we've noticed a sizable regression in the on-disk .log file 
> size. For some deployments the increase was as much as 50%.
> We run our brokers with the 0.8.2 log message format version. The majority of 
> our message volume comes from 0.10.x Java clients sending messages encoded 
> with the Snappy codec.
> Some initial testing only shows a regression between the two versions when 
> using Snappy compression with a log message format of 0.8.2.
> I also tested 0.10.x log message formats as well as Gzip compression. The log 
> sizes do not differ in this case, so the issue seems confined to 0.8.2 
> message format and Snappy compression.
> A git-bisect lead me to this commit, which modified the server-side 
> implementation of `Record`:
> https://github.com/apache/kafka/commit/67f1e5b91bf073151ff57d5d656693e385726697
> Here's the PR, which has more context:
> https://github.com/apache/kafka/pull/2140
> Here is a link to the test I used to re-producer this issue:
> https://github.com/nicktrav/kafka/commit/68e8db4fa525e173651ac740edb270b0d90b8818
> cc: [~hachikuji] [~junrao] [~ijuma] [~guozhang] (tagged on original PR)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5236) Regression in on-disk log size when using Snappy compression with 0.8.2 log message format

2017-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035353#comment-16035353
 ] 

ASF GitHub Bot commented on KAFKA-5236:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3205


> Regression in on-disk log size when using Snappy compression with 0.8.2 log 
> message format
> --
>
> Key: KAFKA-5236
> URL: https://issues.apache.org/jira/browse/KAFKA-5236
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Nick Travers
>Assignee: Ismael Juma
>  Labels: regression
> Fix For: 0.11.0.0
>
>
> We recently upgraded our brokers in our production environments from 0.10.1.1 
> to 0.10.2.1 and we've noticed a sizable regression in the on-disk .log file 
> size. For some deployments the increase was as much as 50%.
> We run our brokers with the 0.8.2 log message format version. The majority of 
> our message volume comes from 0.10.x Java clients sending messages encoded 
> with the Snappy codec.
> Some initial testing only shows a regression between the two versions when 
> using Snappy compression with a log message format of 0.8.2.
> I also tested 0.10.x log message formats as well as Gzip compression. The log 
> sizes do not differ in this case, so the issue seems confined to 0.8.2 
> message format and Snappy compression.
> A git-bisect lead me to this commit, which modified the server-side 
> implementation of `Record`:
> https://github.com/apache/kafka/commit/67f1e5b91bf073151ff57d5d656693e385726697
> Here's the PR, which has more context:
> https://github.com/apache/kafka/pull/2140
> Here is a link to the test I used to re-producer this issue:
> https://github.com/nicktrav/kafka/commit/68e8db4fa525e173651ac740edb270b0d90b8818
> cc: [~hachikuji] [~junrao] [~ijuma] [~guozhang] (tagged on original PR)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5236) Regression in on-disk log size when using Snappy compression with 0.8.2 log message format

2017-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034453#comment-16034453
 ] 

ASF GitHub Bot commented on KAFKA-5236:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/3205

KAFKA-5236; Increase the block/buffer size when compressing with Snappy and 
Gzip

We had originally increased Snappy’s block size as part of KAFKA-3704. 
However,
we had some issues with excessive memory usage in the producer and we 
reverted
it in 7c6ee8d5e.

After more investigation, we fixed the underlying reason why memory usage 
seemed
to grow much more than expected in KAFKA-3747 (included in 0.10.0.1).

In 0.10.2, we changed the broker to use the same classes as the producer 
and the
broker’s block size for Snappy was changed from 32 KB to 1KB. As reported in
KAFKA-5236, the on disk size is, in some cases, 50% larger when the data is 
compressed
with 1 KB instead of 32 KB as the block size.

As discussed in KAFKA-3704, it may be worth making this configurable and/or 
allocate
the compression buffers from the producer pool. However, for 0.11.0.0, I 
think the
simplest thing to do is to default to 32 KB for Snappy (the default if no 
block size
is provided).

I also increased the Gzip buffer size. 1 KB is too small and the default is 
smaller
still (512 bytes). 8 KB (which is the default buffer size for 
BufferedOutputStream)
seemed like a reasonable default.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka kafka-5236-snappy-block-size

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3205.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3205


commit ef4af6757575e694c109074b67e59704ff437b56
Author: Ismael Juma 
Date:   2017-06-02T10:17:23Z

KAFKA-5236; Increase the block/buffer size when compressing with Snappy and 
Gzip

We had originally increased Snappy’s block size as part of KAFKA-3704. 
However,
we had some issues with excessive memory usage in the producer and we 
reverted
it in 7c6ee8d5e.

After more investigation, we fixed the underlying reason why memory usage 
seemed
to grow much more than expected in KAFKA-3747 (included in 0.10.0.1).

In 0.10.2, we changed the broker to use the same classes as the producer 
and the
broker’s block size for Snappy was changed from 32 KB to 1KB. As reported in
KAFKA-5236, the on disk size is, in some cases, 50% larger when the data is 
compressed
with 1 KB instead of 32 KB as the block size.

As discussed in KAFKA-3704, it may be worth making this configurable and/or 
allocate
the compression buffers from the producer pool. However, for 0.11.0.0, I 
think the
simplest thing to do is to default to 32 KB for Snappy (the default if no 
block size
is provided).

I also increased the Gzip buffer size. 1 KB is too small and the default is 
smaller
still (512 bytes). 8 KB (which is the default buffer size for 
BufferedOutputStream)
seemed like a reasonable default.




> Regression in on-disk log size when using Snappy compression with 0.8.2 log 
> message format
> --
>
> Key: KAFKA-5236
> URL: https://issues.apache.org/jira/browse/KAFKA-5236
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Nick Travers
>  Labels: regression
> Fix For: 0.11.0.0
>
>
> We recently upgraded our brokers in our production environments from 0.10.1.1 
> to 0.10.2.1 and we've noticed a sizable regression in the on-disk .log file 
> size. For some deployments the increase was as much as 50%.
> We run our brokers with the 0.8.2 log message format version. The majority of 
> our message volume comes from 0.10.x Java clients sending messages encoded 
> with the Snappy codec.
> Some initial testing only shows a regression between the two versions when 
> using Snappy compression with a log message format of 0.8.2.
> I also tested 0.10.x log message formats as well as Gzip compression. The log 
> sizes do not differ in this case, so the issue seems confined to 0.8.2 
> message format and Snappy compression.
> A git-bisect lead me to this commit, which modified the server-side 
> implementation of `Record`:
> https://github.com/apache/kafka/commit/67f1e5b91bf073151ff57d5d656693e385726697
> Here's the PR, which has more context:
> https://github.com/apache/kafka/pull/2140
> Here is a link to the test I used to re-producer th

[jira] [Commented] (KAFKA-5236) Regression in on-disk log size when using Snappy compression with 0.8.2 log message format

2017-05-14 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009863#comment-16009863
 ] 

Ismael Juma commented on KAFKA-5236:


Thanks for the report. The root cause is that the block size for Snappy was 
changed from 32 KB to 1 KB in the broker:

https://github.com/apache/kafka/pull/2140#discussion_r90383989

This is the same block size used by the producer and with the 0.10.x format, 
the broker won't recompress the messages in the common case.

KAFKA-5148 and KAFKA-3704 are related. We should probably use the default block 
size by default (32 KB) in both broker and producer and allow the block size to 
be configurable as per those JIRAs.

> Regression in on-disk log size when using Snappy compression with 0.8.2 log 
> message format
> --
>
> Key: KAFKA-5236
> URL: https://issues.apache.org/jira/browse/KAFKA-5236
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Nick Travers
>
> We recently upgraded our brokers in our production environments from 0.10.1.1 
> to 0.10.2.1 and we've noticed a sizable regression in the on-disk .log file 
> size. For some deployments the increase was as much as 50%.
> We run our brokers with the 0.8.2 log message format version. The majority of 
> our message volume comes from 0.10.x Java clients sending messages encoded 
> with the Snappy codec.
> Some initial testing only shows a regression between the two versions when 
> using Snappy compression with a log message format of 0.8.2.
> I also tested 0.10.x log message formats as well as Gzip compression. The log 
> sizes do not differ in this case, so the issue seems confined to 0.8.2 
> message format and Snappy compression.
> A git-bisect lead me to this commit, which modified the server-side 
> implementation of `Record`:
> https://github.com/apache/kafka/commit/67f1e5b91bf073151ff57d5d656693e385726697
> Here's the PR, which has more context:
> https://github.com/apache/kafka/pull/2140
> Here is a link to the test I used to re-producer this issue:
> https://github.com/nicktrav/kafka/commit/68e8db4fa525e173651ac740edb270b0d90b8818
> cc: [~hachikuji] [~junrao] [~ijuma] [~guozhang] (tagged on original PR)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)