Re: [pkg-discuss] Code review: making transport deal with poisonous web caches

johansen Mon, 05 Mar 2012 15:21:31 -0800

On Tue, Mar 06, 2012 at 11:55:42AM +1300, Tim Foster wrote:
> On 03/ 6/12 11:17 AM, [email protected] wrote:
> >On Tue, Mar 06, 2012 at 10:08:00AM +1300, Tim Foster wrote:
> >>I've got an incremental, and new webrev at:
> >>https://cr.opensolaris.org/action/browse/pkg/timf/poisoned-cache-v2/
> >
> >On lines 1286-1298 and 1887-1899 you're not recording the fact that
> >you've seen a content error, which is going to skew the repository
> >statistics for future choices.
> 
> Fair point - I'm hand-wringing a bit here, as to whether a fast
> cache that occasionally returns bad content should be treated
> differently to a slow or unavailable repository..


I'm not sure what to make of this either.  I wondered if this should be
a per-publisher policy option.  If the user has configured more than one
repository for a particular publisher, then we'd probably want to choose
the one that's fastest and has the lowest error rate.  However, I
understand that you're seeing this problem for system repositories,
where another location may not be configured or available.  Another
option might be to only implement that policy for the system repo, but
it does seem useful in other situations.

I also wondered if it would just be simpler to provide an override
option on the command line, but that would require users to give up and
re-try the entire installation, which isn't great either.

> >You'll also want to add a content_errors property to the
> >repo statistics, in stats.py.  Then, instead of doing a big dance with
> >adjusting the retry count on the fly, simply record a content error when
> >verification fails.  At the beginning of the 'for d, v in
> >self.__gen_repo()' loop, you can check whether you're on a subsequent
> >iteration, and if so, look at the repostats to determine if you've seen
> >a content error.
> 
> I'll give that a go. Recording content errors seems like a good
> start, and perhaps might allow us to take different actions on
> repositories with content errors in the future.

We do sort of take a different action now, in the sense that a content
error is weighted differently than a transient error.  Repositories that
constantly give us content errors eventually fall out of favor, but I
suppose we could further differentiate between incorrectly cached
content and persistently corrupted content.

> Apart from the bug report, I don't know how prevalent this problem
> is in the wild, but it certainly felt like something we should be
> addressing. I'll go back to the drawing board, and send another
> webrev when I'm done.   Thanks for the suggestions.

Me either really.  Shawn and I discussed this a bit, long ago.  But I
can't remember if it was in response to a problem we saw, or if it was
just theoretical.

Sorry to throw a monkey wrench into things this late in the review
process.  My hope is that this will actually simplify the code.

-j
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code review: making transport deal with poisonous web caches

Reply via email to