On 7 June 2013 13:48, Matthew Miller <mat...@fedoraproject.org> wrote:

> On Fri, Jun 07, 2013 at 01:31:36PM -0600, Stephen John Smoogen wrote:
> > The easiest way I could see is just get a better sampling method which
> > would be to have funding for a mirror which we then put into
> mirror-manager
> > and we know that this is a sampling versus a request info. (basically we
> > would see what packages are downloaded directly and then extend that
> sample
> > from the amount of downloads to the 500,000 systems that check in via
> > mirrormanager). The problems involved are paying for systems, storage,
> and
> > bandwidth for such items.
>
> Maybe one of the mirrors would be able to provide logs?
>
>
Possibly. In the past mirror admins have not wanted to do so for many
reasons (can't keep logs longer than 24 hours for policy reasons, can't
give over logs without a formal agreement and then with as much redacted as
possible, if we do it for X then we have to do it for everyone so no
thankyou.) When I was at my university gig, it had to go up 4 levels of
management before I gave up at the sub-CIO level.)

I have tried looking at the top level mirrors but most of the data is
swamped out by other sites mirroring and lots of people doing development
work and pointing to repos directly. This led to some strange statistics
where trying to pull out even most of the noise made for various packages
to "stand out" until I realized they were pulled in for cross-compiles and
such (or the site that likes to do partial mirrors every couple of hours
but always pulls in the same 4 packages each time even when it pulls in
others.) I am expecting that other mirrors are going to run into that which
means that stuff that a lot of sites could give out (just the urls per day)
versus the IP address, URL would mean that the data would have a lot of
weird noise that makes say zvbi show up high because it is both getting
mirrored as the last package on the server and also because 8 packages use
it as depends (not true but I can't remember the package that showed up a
ton.)

In either case, it is what got me to realize that a mirror is needed to
allow for better statistics of this sort because the data can be cleaned as
needed versus pre-cleaned and reanimated.


-- 
Stephen J Smoogen.
_______________________________________________
epel-devel mailing list
epel-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/epel-devel

Reply via email to