Re: Copr Build System - review of 2019 and vote for features in 2020

Ken Dreyer Sun, 02 Feb 2020 08:19:13 -0800

On Thu, Jan 23, 2020, 9:04 AM Stephen John Smoogen <smo...@gmail.com> wrote:


> On Wed, 22 Jan 2020 at 06:04, Kevin Kofler <kevin.kof...@chello.at> wrote:
> >
> > Kevin Kofler wrote:
> > > IMHO, this whole "delete by default" concept is inherently flawed and
> > > dangerous and cannot be fixed. Notification e-mails can be lost in so
> many
> > > ways (wrong Fedora notification settings, e-mail provider issues, spam
> > > filter false positives, out-of-quota mailbox, etc.) or be missed due to
> > > being offline for a prolonged period of time. It should never be
> allowed
> > > to delete users' data without their explicit confirmation. Especially
> in
> > > this case where it is not even possible to reupload the data because
> Copr
> > > can no longer build for those EOL chroots (which is another quite
> annoying
> > > limitation of Copr – allowing to build for EOL releases would also
> allow
> > > people to try backporting select security fixes to those releases
> Fedora
> > > no longer wants to support).
> >
> > PS: I also think that at the very least, there ought to be a way to
> > permanently opt-out a Copr repository out of all future cleanups. Some
> Coprs
> > such as the Kannolo Copr should just always be preserved.
> >
> > I also do not understand why 6 TB of disk space is such an issue in times
> > where one single HDD can carry up to 16 TB.
>
> I think I have written something like the following every 3-4 years.
> Every time newer larger disks come out there is some sort of magical
> effect which makes people forget why that doesn't solve the problems
> that koji, copr and mirrors have when dealing with storage.
>
> 1. A single disk can carry a lot of data, and can be reasonable fast
> to get for 1 person. However start adding more people and different
> size read/writes and that disk will drop completely.
> 2. SATA disks are cheap, but they are very expensive to use because
> the logic on the controller and disk are dumb. Add different
> read/write sizes like you do for a large number of users and the
> actual performance dies.
> 3. Money for hardware, software and the place to put said
> hardware/software needs to be accounted for and so just because you
> ran out of disk space.. you will have to wait until the next business
> cycle to get money to buy it. You will also be competing with every
> other group who want money for their things too. This isn't any
> different from a University or a company.
>
> So we could go buy 1-4 very large disks and no person using COPR will
> never see anything you compiled because they and 10k other charging
> bison will be trying to get data from the disks. The IO falls through
> the floor or becomes a square wave form. I say this from experience of
> trying this experiment several times because someone read that they
> could buy 4 <fill in new giant disk size> and it would quadruple what
> they could get from our shared storage. Then we go through all the
> magical kernel and filesystem options that they found which 'make it
> work for someone else'.. and then we end up having to either buy more
> disks because the fundamental problem is that drinking a milkshake
> through a single cocktail straw doesn't work very well. You need more
> cocktail straws or a bigger straw or something less thick than a
> milkshake.
>
> So you need more read/write heads to get the data to people in a way
> which is acceptable for their own patience. We can do that normally
> with 8 SAS disks (expensive smaller disks with a lot more onboard
> logic to control data-ordering problems ) or 12-16 SATA disks to get
> the same IO performance. With SATA arrays you end up with un-usable
> space if you are being actively reading/writing a lot. Yes you have
> 200 TB in that array.. but you can only effectively use 40 to 100 TB
> at a high IO rate. With SAS you are pretty much at what you have is
> what you can use. If you have a workload like backups or something
> similar where you are only doing writes, or only doing reads for a
> SMALL number of processes you can also use a large SATA array and use
> all of it. If instead you are using it for a lot of processes wanting
> lots of different data or writing different data (aka COPR), you
> effectively starve yourself by using it all because there is not
> enough bandwidth. [I guess an analogy would be to be using the 36+ GB
> memory sizes on a 32 bit processor.
>


I'm not trying to suggest that Ceph would be a magic bullet here, and it
comes with its own costs. Nevertheless since I work on the RH storage team,
it occurred to me that the performance and scale problems that you've
touched on are exactly the problem space that the Ceph project tries to
address.

- Ken

_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Copr Build System - review of 2019 and vote for features in 2020

Reply via email to