Re: [pacman-dev] [PATCH v2 1/3] bacman: allow for parallel packaging

Gordian Edenhofer Fri, 02 Sep 2016 05:47:11 -0700

On Thu, 2016-09-01 at 13:14 +1000, Allan McRae wrote:
> On 01/09/16 09:44, Gordian Edenhofer wrote:
> > 
> > On Thu, 2016-09-01 at 08:28 +1000, Allan McRae wrote:
> > > 
> > > On 01/09/16 08:08, Dave Reisner wrote:
> > > > 
> > > > 
> > > > On Wed, Aug 31, 2016 at 11:18:32PM +0200, Gordian Edenhofer
> > > > wrote:
> > > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > The second probably would not be accepted...
> > > > > > > 
> > > > > > > I urge you to reconsider. Parallelization increases the
> > > > > > > speed
> > > > > > > of
> > > > > > > this
> > > > > > 
> > > > > > I don't think anyone is suggesting that packaging multiple
> > > > > > things in
> > > > > > parallel isn't useful. I already suggested that nothing
> > > > > > needs
> > > > > > to be
> > > > > > implemented in bacman proper in order for you to
> > > > > > parallelize
> > > > > > the
> > > > > > work.
> > > > > > You can write your own "pbacman" as simply as:
> > > > > > 
> > > > > >   for arg; do bacman "$arg" & done; wait
> > > > > 
> > > > > There is a huge difference between flooding your system with
> > > > > ~1000 jobs
> > > > > and tightly controlling the maximum number. Adjusting the
> > > > > precise
> > > > > number of jobs enables you to organize your resources which
> > > > > itself is
> > > > > desirable.
> > > > 
> > > > Then use a program like 'parallel' which has this sort of knob.
> > > > I
> > > > really
> > > > wonder what it is you're doing that requires running bacman
> > > > with a
> > > > large
> > > > number of packages with any regularity.
> > > > 
> > > 
> > > Gathering the files etc takes no time.  It is really the
> > > compression
> > > that is being made parallel.  If only there was a way to set
> > > compression
> > > to use mutlithreading...
> > 
> > The actual compression using xz (default) is not necessary the most
> > time intensive part. The linux-headers package for example is
> > compressed within a few seconds but the whole process before xz is
> > run
> > takes way longer. This can be seen with top as an illustration or
> > simply by running bacman one time without compression and the other
> > with.
> > Moreover using bacman to parallelize makes it completely
> > independent
> > from the archive format used and still brings gains when recreating
> > multiple packages. At the very least it would fill the gap in
> > between
> > the compression of multiple packages. Therefore it would be
> > beneficial
> > even if compression would take the longest which is doesn't always
> > do.
> > 
> 
> So read speed is the slow part?   And trying to read more files at
> the
> same time helps?


Obviously read speed is not the limitation here. In this case bacman
would not speed up by increasing the job count - no matter the
implementation - but it obviously does. To have a good comparison I ran
the tests again with xz set to use multiple threads. The results can be
seen here [1] and the code is available here [2].
Surely tuning xz helps especially for single packages but using
multiple jobs bring the real speed boost when recreating more than one
package. The fact that xz can be tuned as well is no secret and was
stated in the man page and mentioned in the usage section from the
beginning on.
Furthermore the implementation is only a few additional lines of code,
must be explicitly invoked and should in no case slow someone down.

Best Regards,
Gordian Edenhofer

[1] http://edh.ddns.net/pacman_ml_bacman_benchmarks/bacman:%20simple%20
benchmark.svg
[2] http://edh.ddns.net/pacman_ml_bacman_benchmarks/bacman:%20simple%20
benchmark.R.txt

signature.asc
Description: This is a digitally signed message part

Re: [pacman-dev] [PATCH v2 1/3] bacman: allow for parallel packaging

Reply via email to