[ 
https://issues.apache.org/jira/browse/KAFKA-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195538#comment-15195538
 ] 

ASF GitHub Bot commented on KAFKA-3250:
---------------------------------------

GitHub user granthenke opened a pull request:

    https://github.com/apache/kafka/pull/1075

    KAFKA-3250: release tarball is unnecessarily large due to duplicate l…

    …ibraries
    
    This ensures duplicates are not copied in the distribution without 
rewriting all of the tar'ing logic. A larger improvement could be made to the 
packaging code, but that should be tracked by another jira.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/granthenke/kafka libs-duplicates

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/1075.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1075
    
----
commit 8cdbf18fb5b751e0fc922d405643c152daaef4d1
Author: Grant Henke <granthe...@gmail.com>
Date:   2016-03-15T15:53:43Z

    KAFKA-3250: release tarball is unnecessarily large due to duplicate 
libraries
    
    This ensures duplicates are not copied in the distribution without 
rewriting all of the tar'ing logic. A larger improvement could be made to the 
packaging code, but that should be tracked by another jira.

----


> release tarball is unnecessarily large due to duplicate libraries
> -----------------------------------------------------------------
>
>                 Key: KAFKA-3250
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3250
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.9.0.1
>            Reporter: Gwen Shapira
>            Assignee: Grant Henke
>             Fix For: 0.10.0.0
>
>
> Between 0.8.2.2 and 0.9.0, our release tarballs grew from 17M to 34M. We 
> thought it is just due to new libraries and dependencies. But:
> 1. If you untar Kafka into a directory and check the directory size (du -sh), 
> it is around 28M, smaller than the tarball. Recompressing give you 25M 
> tarball.
> 2. If you list the original tar contents and grep for "snappy", you see it 4 
> times in the tarball.
> Clearly we are creating a tarball with duplicates (and we didn't before).
> I think its due to how we are generating the tarball from core but pull in 
> other projects into libs/ directory with their dependencies (which overlap).
> We need to find out how to sort it out (possibly with excludes).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to